RSRE  MEMORANDUM  No.  4562 


:  4  r 


TIrO 


RSRE 

MEMORANDUM  No.  4562 


ROYAL  SIGNALS  &  RADAR 
ESTABLISHMENT 


QR  DECOMPOSITION  BASED  ALGORITHMS  AND  ARCHITECTURES 
FOR  LEAST-SQUARES  ADAPTIVE  FILTERING 


Authors:  I  K  Proudler  &  J  G  McWhIrter 


PROCUREMENT  EXECUTIVE, 
MINISTRY  OF  DEFENCE, 
RSRE  MALVERN, 
WORCS. 


Thh  d 

ioi  pyhiic  fcle..ri-5  and  aalej  W 


fX'uinojit  hos  booa.appiovdd 


4^ 


t  ’» 

5  .  ■' 


A  ft 

v'  V.’ 


UNLIMITED 


92-06595 


CONDITIONS  OF  RELEASE 

0120146  308888 


. . .  DRICU 

COPYRIGHT  (c) 

1988 

CONTROLLER 
HMSO  LONDON 


• . . . . . . .  DRICY 

Repofts  quoted  are  not  necessarily  available  to  members  of  the  public  or  to  commercial 
organisations. 


DEFENCE  RESEARCH  AGENCY 
RSRE  Memorandum  4562 

TITLE;  QR  DECOMPOSITION  BASED  ALGORITHMS  AND  ARCHITECTURES 

FOR  LEAST-SQUARES  ADAPTIVE  FILTERING. 

AUTHORS:  I  K  Proudler  and  J  G  McWhirtO'. 

DATE:  January  1992 

SUMMARY 


In  this  memorandum  we  show  how  the  method  of  QR  decomposition  (QRD)  may  be  applied  to  the 
adaptive  filtering  and  beamforming  problems.  QR  decomposition  is  a  form  of  orthogonal  triangularisa- 
tion  which  is  particularly  useful  in  least  squares  computations  and  forms  the  basis  of  some  very  stable 
numerical  algorithms.  When  applied  to  the  problem  of  narrowband  adaptive  beamforming  where  the 
data  matrix,  in  general,  has  no  special  structure,  this  technique  leads  to  an  architecture  which  can  carry 
out  the  required  computations  in  parallel  using  a  triangular  array  of  relatively  simple  processing  ele¬ 
ments. 


The  problem  of  an  adaptive  time  series  filter  is  also  considered.  Here  the  data  veaors  exhibit  a  sim¬ 
ple  time-shift  invariance  and  the  COTesponding  dau  matrix  is  of  ToepliQ  structure.  In  this  case,  the  tri¬ 
angular  processor  array  is  known  to  be  very  inefficient.  Instead,  it  is  shown  how  the  Toeplitz  structure 
may  be  used  to  reduce  the  computational  complexity  of  the  QR  decomposition  technique.  The  resulting 
orthogonal  least  squares  lattice  and  "fast  Kalman”  algmithms  may  be  implemented  using  far  fewer 
processing  elements.  These  "fast”  QRD  algorithms  are  very  similar  to  the  more  convenbonal  ones  but. 
in  general,  are  found  to  have  superior  numerical  properties. 

The  more  general  problem  of  multi-cbtuuiel  adaptive  filtering  which  arises,  for  example,  in  broad¬ 
band  adaptive  beamfcaming  can  also  be  solved  using  the  QRD-based  least-squares  minimisation  tech¬ 
nique.  In  this  case  the  data  matrix  has  a  block  Toeplitz  structure  which  may  be  exploited  to  generate  an 
efficiem  multi-channel  fast  Kalman  or  least  squares  lattice  algorithm.  The  multi-channel  least  squares 
lattice  algorithm  may  be  implemented  using  a  lattice  of  the  triangular  processor  anays  and  so  it  consti¬ 
tutes  a  hybrid  solution  which  encompasses  the  algorithms  and  architectures  of  narrowband  adaptive 
beamfarming  and  adaptive  filtering  as  special  cases. 
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1  Introduction. 


In  this  memorandum  we  will  show  how  the  method  of  QR  decomposition  (QRD)  may  be  applied 
to  the  adaptive  filtering  and  beamforming  problems.  QR  decomposition  is  a  fcHin  of  orthogonal  trian- 
gularisation  which  is  particularly  useful  in  least  squares  computations  and  forms  the  basis  of  some  voy 
stable  numerical  algorithms. 

In  section  2,  we  show  how  the  method  of  QR  decomposition  by  Givens  rotations  may  be  applied 
to  the  problem  of  narrowband  adaptive  beamfmning  where  the  data  matrix,  in  general,  has  no  special 
structure.  In  particular,  it  is  shown  how  the  least  squares  computation  may  be  carried  out  in  parallel  us¬ 
ing  a  triangular  array  of  relatively  simple  processing  elements. 

In  section  3,  we  consida  the  problem  of  an  adaptive  time  series  filter  for  which  the  data  vectors 
exhibit  a  simple  time-shift  invariance  and  the  corresponding  data  matrix  is  of  Toqrlitz  structure.  In  this 
case,  the  triangular  processor  array  described  in  section  2  is  known  to  be  very  inefficient.  Instead,  it  is 
shown  how  the  Toeplitz  structure  may  be  used  to  reduce  the  computational  complexity  of  the  QR  de¬ 
composition  technique.  The  resulting  orthogonal  least  squares  lattice  and  “fast  Kalman"  algonthms.  de¬ 
rived  in  section  3,  may  be  implemented  using  far  fewo-  processing  elements.  These  “fast"  QRD  algo¬ 
rithms  are  very  similar  to  the  more  conventional  ones  (12]  but.  in  general,  are  found  to  have  superior 
numerical  properties. 

In  section  4,  we  consider  the  more  general  problem  of  multi-channel  adaptive  filtering  which  aris¬ 
es.  for  example,  in  broadband  adaptive  beamfctrming.  In  this  case  the  data  matrix  has  a  block  Toeplitz 
structure  which  may  be  exploited  to  generate  an  efficient  multi-channel  fast  Kalman  or  least  squares 
lattice  algorithm.  The  multi-chaimel  least  squares  lathee  algorithm  may  be  implemented  using  a  lathee 
of  the  triangular  processor  anays  discussed  in  sechon  2  and  so  it  conshtutes  a  hybrid  solution  which 
encompasses  the  algorithms  and  architectures  of  sechons  2  and  3  as  special  cases.  The  appendix  con¬ 
tains  listings  of  the  various  algorithms  in  an  ALGOL-like  code. 

The  content  of  this  memorandum  appears,  with  slight  modificahons.  as  chapter  7  -  “The  QR  Fam¬ 
ily"  -  of  the  book  enhtled  “Adaphve  System  Identification  and  Signal  Processing  Algorithms",  edited 
by  S.  Theodoridis  and  N.  Kaloupsidis,  and  published  by  Prenhce-Hall. 

2  Narrow-band  Beamforming. 

2.1  QR  DecomposKion 

A  narrow-band  beamformer  is  essenhally  a  spahal  filter  and  the  corresponding  adaptive  beam¬ 
forming  problem  may  be  formulated  in  terms  of  least  squares  minimisahon  [30].  A  fundamental  struc¬ 
ture  in  least  squares  minimisation  problems  is  an  adaphve  linear  combiner  of  the  type  illustrated  in  fig¬ 
ure  1 .  This  may  be  applied  directly  to  the  problem  of  narrowband  adaphve  beamforming.  lesulhng  in 
the  so-called  generalised  sidelobe  canceller  (30].  The  combined  output  from  an  adaphve  linear  combin- 
a  at  sample  ume  t,  is  denoted  by 
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X(t)  y(t) 


Figure  1  Canonical  Adaptive  Linear  Combiner 


e(t)  =  2t'''(tj)w+y(tj)  (1) 


where  x(t,)  is  the  p-element  (complex)  vector  of  auxiUary  signals  at  time  tj  and  y(tj)  is  the  corresponding 
sample  of  the  primary  signal.  The  residual  signal  power  at  time  n  is  estimated  by  the  quantity 


E„(w)  =  ;e(n)  ^ 

(2) 

where 

e(n)  =  B(n)[e(I),e{2) . e(n)]^ 

(3) 

and  the  diagonal  matrix 

B(n)  -  diag[P""'.P"'^ . I]  (4) 

constitutes  an  exponential  window  with  0  <  P  ^  1 .  From  equation  ( 1 )  it  foUows  that  the  vector  of  resid¬ 
uals  may  be  written  in  the  form 


e(n)  «  X(n).w  +  y(n) 


(5) 


where 
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(6) 


X(n)  =  and  y(n)  =  B(n) 


ix^(n); 


iyO) 

*y(2) 


[y(ny 


X(n)  is  simply  the  matrix  of  all  data  samples  received  by  the  combiner  up  to  time  n.  and  y(n)  is  the  cor¬ 
responding  vector  of  data  in  the  primary  or  refCTence  channel.  For  convenience,  the  matrix  B(n)  has  sim¬ 
ply  been  absorbed  into  the  definition  of  6(n),  y(n)  and  X(n). 

The  least  squares  weight  vector  w(n)  is  simply  the  one  which  minimises  E„(^  and  the  convention¬ 
al  approach  [12]  to  this  problem  involves  explicit  computation  of  the  data  covariance  matrix 
M(n)  -  X*^(n)X(n)  and,  as  a  result,  the  condition  number  of  the  problem  is  squared.  For  a  given  finite 
wordlength.  this  leads  to  a  considerable  loss  in  performance  and  should  be  avoided  if  possible.  Note  that 
the  computation  of  w(n)  is  based  on  all  data  received  up  to  time  n. 

An  alternative  approach  to  the  least-squares  estimation  problem  is  the  method  of  QR  decomposi¬ 
tion  which  constitutes  a  form  of  orthogonal  triangularisation  and  has  particularly  good  numerical  prop¬ 
erties.  It  wiU  be  generalised  here  to  the  case  of  complex  data  as  required,  for  example,  in  narrowband 
adaptive  beamforming.  An  (n  x  n)  unitary  matrix  Q(n)  is  generated  such  that 


Q(n)X(n)  = 


(7) 


where  R(n)  is  a  p  x  p  upper  tnangular  matrix.  Then,  since  Q(n)  is  unitary,  we  have 

M(n)  =  x”(n)X(n)  =  x”(n)Q”(n)Q(n)X(n)  =  R”(n)R(n)  (8) 


so  that  the  triangular  matrix  R(n)  is  the  Cholesky  (square-root)  faaor  of  the  data  covariance  matrix 
M(n). 


Now.  again  by  virtue  of  the  unitary  nature  of  the  matrix  Q(n).  we  have 


where 


I  e(n)  l  «  i  <3(n)e(n):i 


|’^Hw(n)  + 

Lo  ! 


u(ny  ' 

y(n) 


-  Q(n)y(n) 

!  y(n); 


It  follows  that  the  least  squares  weight  yectOT  Mn)  must  satisfy  the  equation 


(9) 


(10) 
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R(n)w(n)  +  u(n)  =  p 


(11) 


and  hence 


E„(w)  =  i|y(n);  ^ 


(12) 


Since  the  matrix  R(n)  is  uppo'  triangular,  equation  (1 1)  is  much  easier  to  solve  than  the  Gauss  nor¬ 
mal  equations  [12].  The  weight  vector  w(n)  may  be  derived  quite  simply  by  a  process  of  back-substitu¬ 
tion.  Equation  (1 1)  is  also  much  better  conditioned  since  the  condition  number  of  R(n)  is  identical  to 
that  of  the  original  data  matrix  X(n),  the  two  being  related  by  a  unitary  transformabon. 

2.2  Givens  Rotations 

The  triangulansation  process  may  be  carried  out  using  either  Householder  transformations!  10]  or 
Givens  rotations[9].  However,  the  Givens  rotation  method  is  particularly  suitable  for  adaptive  filtering 
since  it  leads  to  a  very  efficient  algorithm  whaeby  the  triangularisation  process  is  recursively  updated 
as  each  new  row  of  data  enters  the  combiner.  Assume  that  the  matrix  X(n-1 )  has  already  been  reduced 
to  triangular  form  by  the  unitary  transformation 


Q(n-l)X(n-l) 


jR(n-iy 

!  o  I 


and  define  the  unitary  matrix 


Q(n-l) 


Q(n-l) 


0 

1 


Clearly. 


(3(n-l)X(n)  =  Q(n-1);^^^"“'^. 

xT(n) 


jPR(n-  1)| 

i  O  ' 


I  x^(n) 


(13) 


(14) 


(15) 


and  hence  the  triangularisation  of  X(n)  may  be  completed  by  using  a  sequence  of  (complex)  Givens  ro¬ 
tations  to  eliminate  the  vector  jt^fn).  Each  Givens  rotation  is  an  elementary  unitary  transformation  of 
the  form 


c  S*  jO  ■■ 

.  0.  Pr,  . 

..Pr,.. 

Jo.. 

O.r/,. 

..  r/  ... 

-s  c  _ lO  . . 

.  0,  X,  , 

.  .  x^  . . 

lo.. 

L- 

.  0. 0  . 

.  .  X/  .  . J 

where 


c^-K's;^ 


(17) 
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and  the  cosine  parameter  is  assumed  to  be  real  without  loss  of  generality.  Qearly  we  require 


The  sequence  of  rotations  is  applied  as  follows.  The  p  element  vector  x^(n)  is  rotated  with  the  first 
row  of  PR(n  -  1)  so  that  the  leading  element  of  x^(n)  is  eliminated,  and  a  reduced  vector  x'^(n)  is  pro¬ 
duced.  Note  that  the  first  row  of  R(n  -  1)  is  modified  in  the  process.  The  (p-l)-element  reduced  vector 
x'^(n)  is  then  rotated  with  the  second  row  of  PR(n  -  1)  so  that  the  leading  elemem  of  x'^(n)  is  eliminat¬ 
ed.  and  so  on  until  every  element  of  the  data  vector  has  been  aimihilated.  The  resulting  triangular  matrix 
R(n)  then  corresponds  to  a  complete  triangularisation  of  the  matrix  X(n)  as  defined  in  equation  (7).  The 
corresponding  unitary  matrix  Q(n)  is  simply  given  by  the  recursive  expression 

Q(n)  =  0{n)Q(n-l)  (19) 

where  Q(n)  is  a  unitary  matrix  representing  the  sequence  of  Givens  rotation  operations  described  above, 
i.e. 


"pR(n-l^  rR(n)1 
Q(n)'  O  i  =  O  j 
i  xT(n)  J  :  pTJ 


I;  is  not  difficult  to  deduce  in  addition  that 


Q(n) 

Pu(n-  1)~ 

y(n)  j 

Py(n-l),  = 

Py(n-  !)■  = 

y(n)  j 

odn)  J 

y(n) 

y(n)^ 


(20) 


(21) 


and  this  shows  how  the  vector  ii(n)  can  be  updated  recursively  using  the  same  sequence  of  Givens  ro¬ 
tations.  The  least  squares  weight  vector  s(n)  may  then  be  derived  by  solving  equation  (11).  The  solution 
is  not  defined,  of  course,  if  n  <  p  but  the  recursive  triangularisation  process  may.  nonetheless,  be  initial¬ 
ised  by  setting  R(0)  =  O  and  ufO) «  p. 

2.3  Parallel  Implementation 

The  Givens  rotation  algorithm  described  above  may  be  implemented  in  parallel  using  a  triangular 
processOT  array  of  the  type  illusnated  in  figure  2  for  the  case  p  «  4.  It  comprises  three  distinct  sections 
-  the  basic  triangular  array  labelled  ABC.  the  right  band  column  of  cells  labelled  DE  and  the  final 
processing  cell  labelled  F. 

At  time  n-1.  each  cell  within  the  basic  triangular  array  stores  one  element  of  the  triangular  matrix 
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R(n  -  1)  and  each  cell  in  the  right  hand  column  DE  stores  one  element  of  the  corresponding  vector 
u(n-l).  On  the  next  clock  cycle,  the  data  vector  [x^(n).  y(n)]  is  input  to  the  top  of  the  array  as  shown. 
Each  row  of  cells  within  the  basic  triangular  array  performs  a  Givens  rotation  between  one  row  of  the 
stored  triangular  matrix  and  a  vector  of  data  received  from  above  so  that  the  leading  element  of  the  re¬ 
ceived  vector  is  eliminated.  The  reduced  data  vector  is  then  passed  downwards  through  the  array.  The 
boundary  cdl  in  each  row  computes  the  appropriate  rotation  parameters  and  passes  them  on  to  the  right 
so  that  the  internal  cells  can  apply  the  same  rotation  to  all  other  elements  in  the  received  data  vector. 
This  arrangement  ensures  that  as  the  input  data  veaor  s^(n)  moves  down  through  the  array  it  interacts 
with  the  |»eviously  stored  triangular  matrix  R(n  -  1)  and  undergoes  the  sequence  of  rotations  0(n) 
described  in  section  2.2,  All  of  its  elmnents  are  thereby  eliminated  (one  on  each  row  of  the  array)  and 
the  updated  triangular  matrix  R(n)  is  generated  and  stored  in  the  process. 

As  the  corresponding  iq>ut  sample  y(n)  moves  down  through  the  right  band  column  of  cells,  it 
undergoes  the  same  sequence  of  Givens  rotations  inteiactit^  with  the  stored  vector  ii(n-l )  and  thereby 
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Boundary  Cell 


IFXi„  =  OTHEN 

c  <-  1 ;  s  <-  0:  r «-  Pr; 

ELSE 


((Pr)/r'); 


s«-  (x,„/r'); 


r<-  r ; 
ENDIF 


•CYi, 


Internal  Cell 
x,„ 


(c.s) 


i 


(c.s) 


4 


’‘ou.^-cx.n-spr 

r<-s*x.„  +  cpr 


Figure  3  Processing  Elements  for  Triangular  QRD  Array 


generating  the  updated  vector  u(n).  The  resulting  output,  which  emerges  from  the  bonom  cell  in  the 
right  hand  column,  is  simply  the  value  of  the  parameter  a(n)  in  equation  (21). 


The  function  of  the  rotation  cells  is  specified  in  figure  3  and  follows  immediately  from  equations 
(16)  and  (18).  The  boundary  cell  includes  an  additional  parameter  y  which  is  not  required  for  the  basic 
Givens  rotation  but  will  be  explained  in  section  2.3  where  the  function  of  the  final  cell  F  is  also  made 
clear.  Since  the  matrix  R  is  initialised  to  zero,  it  can  be  seen  that  the  elements  on  its  leading  diagonal 
(i.e.  the  values  of  r  within  the  boundary  cells)  may  be  treated  as  real  variables  throughout  the  compu¬ 
tation  and  so  the  number  of  real  arithmetic  opo'ations  performed  within  the  boundary  cell  is.  surpris¬ 
ingly.  less  than  that  required  for  an  into'nal  cell. 


For  simplicity  and  ease  of  discussion,  we  have  assumed  that  all  cells  of  the  array  in  figure  2  operate 
on  the  same  input  data  vector  during  a  given  clock  cycle.  The  critical  path  for  the  array  therefore  in¬ 
volves  2p+l  cells  and  the  maximum  rate  at  which  dau  vectors  can  be  input  is  ~  I  A(2p+1  )T)  where  T  is 
the  typical  processing  time  for  a  single  cell.  When  Gentleman  and  Kung[8]  first  proposed  the  triangular 
array  in  figure  2.  they  showed  how  it  could  be  fully  pipelined  by  introducing  latches  to  store  the  outputs 
generated  by  each  cell  before  they  are  passed  on  as  inputs  to  the  neighbouring  cells.  The  resulting  systo¬ 
lic  array  can  thieve  an  input  clock  rate  -  1  A"  and  only  requires  nearest  neighbour  cell  interconnections 
which  is  highly  advantageous  for  VLSI  implemenution. 


The  systolic  array  may  be  generated  by  cutting  the  diagram  in  figure  2  along  all  diagonals  parallel 


to  the  one  indicated  and  introducing  a  storage  element  where  each  data  path  crosses  a  cut  line.  Note  that 
these  cut  lines  also  intersect  the  input  data  paths  and  so  each  input  data  vector  enters  the  triangular  array 
in  a  skewed  or  staggo'ed  manner.  Qearly ,  figure  2  provides  a  sufficient  description  of  the  parallel  algo¬ 
rithm  without  including  the  detailed  pipelining  and  timing  aspects  associated  with  a  systolic  or  wave¬ 
front  array.  This  relative  simplicity  means  that  the  parallel  processor  can  be  represented  more  easily  in 
block  diagrammatic  form  later  in  this  memorandum. 

2.4  Square-Root-Free  Version 

Gentleman!?]  and  Hammarling[l  1]  have  derived  extremely  efficiem  QR  decomposition  algo¬ 
rithms  based  on  modified  Givens  rotations  which  require  no  square-root  operation.  The  essence  of  these 
square-root-lree  algorithms  is  a  factorisation  of  the  form 

R(n)  =  D'''^n)R(n)  (22) 

where  D(n)  =  diag  ir^jfnf.r^jfn) . r^p(n)i  (23) 


and  R(n)  is  a  unit  upper  triangular  matrix.  The  complex  Givens  rotation  in  equation  ( 1 6)  then  takes  the 
form 


\  r  !  0  . . .  0,  Pvd  . . .  P  .df.  ...  I  '0  ...  0,  .  /i'r'  . . .  i 

I  ‘  “  I  ^ 

-s  c.  0...  0.  ./6x,  ...  -.'hx^.-.i  jO...  0,0  ...  .,'6'x^' ...j 


(24) 


where  ,  d  represents  an  element  in  the  diagonal  matrix  D'^(n).  and  denotes  an  associated  off-diag¬ 
onal  element  from  the  matrix  R(n).  Note  that  each  element  x,^  of  the  data  vectw  has  been  expressed  in 
the  form  ^  bx,^  where  6  is  a  scaling  factor  which  changes  value  as  a  result  of  the  rotation. 

The  square-root-fiee  algorithm  may  also  be  implemented  using  the  type  of  triangular  processor 
array  depicted  in  figure  2  but  with  the  rotation  cells  as  defined  in  figure  4.  In  this  case,  the  boundary 
cells  store  and  update  elements  of  D(n)  (i.e.  the  squares  of  the  diagonal  elements  of  R(n) )  while  the 
internal  cells  store  and  update  the  off-diagonal  elements  of  R{n) .  The  function  of  these  cells  may  be 
deduced  in  a  straightforward  manner  starting  from  equations  (24)  and  (17)  and  noting  that  the  values  of 
c  and  s  need  not  be  computed  explicitly.  Note  that  the  scale  quantities  6  are  updated  only  at  the  boundary 
ceUs  and  passed  diagonally  from  one  row  to  the  next  Note  also  that  the  parameters  d  and  6  in  each 
boundary  cell  may  be  treated  as  real  variables  throughout  the  computation  assuming  that  they  are  as¬ 
signed  real  values  iititially.  For  the  purposes  of  normal  least  squares  processing,  the  6  parameter  is  ini¬ 
tialised  to  unity  so  that  on  input  to  the  array.  ;(n)  >  ][(n).  However,  as  pointed  out  by  Gentleman!?], 
the  6  parameter  associated  with  each  input  vector  may  be  assigned  an  arbitrary  initial  value  which  smes 
to  wdght  that  row  of  data  accordingly.  The  square-root-free  airay  may  thus  be  used  to  perform  a  general 
weighted  least  squares  computation. 
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As  with  the  conventional  Givens  rotation  algorithm,  cells  in  the  right  hand  column  perform 
the  same  function  as  those  internal  to  the  triangular  array.  Thus,  allowing  for  the  factorisation  shown  in 
equation  (22).  these  cells  update  elements  of  the  vector  u(n).  where 

u(n)  =  D'^^n)u(n)  (23) 

(t  follows  from  equations  (22).  (23).  and  (11)  that  the  weight  vector  w(n)  is  given  by 

R(n)w(n)  p(n)  »  p  (26) 

and  may  be  obtained,  as  before,  by  a  simple  process  of  back  substitution.  Figure  4  specifies  the  cell  func¬ 
tions  required  for  one  particular  version  of  the  square-root-ffee  algorithm.  This  version,  which  requires 
2  multiplications  and  2  additions  per  cycle  to  be  perfotmed  in  each  internal  cell,  was  first  suggested  by 
Golub  and  reported  by  Gentleman[7|.  We  have  found  it  to  be  particularly  staMe  and  accurate  throughout 
an  extensive  programme  of  finite  wordlength  adaptive  filtering  and  beamforming  computa- 
tions[23][32].  These  observations  are  supported  by  the  work  of  Ling.  Manolakis  and  Ftoakis[13]  who 
derived  an  equivalent  algorithm  (the  “error-feedback”  algorithm)  based  on  a  recuni  ve  form  of  the  mod¬ 
ified  Gram-Schmidt  orthogonalisanon  procedure  -  see  section  2.8. 

The  "square-root-ffee"  algorithm  described  above  is  in  fact  an  algorithm  for  calculating  the  re- 
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Method  of  Calculation  Method  of  Application 

Using  square-roots  (SQ)  S/  Feedforward  (FF) 

Square-root-free  (SF)  Feedback  of  output  (FB) 

Feedback  of  stored  parameter  (MFB) 

Table  1  Options  for  Implementing  a  Givens  Rotation 


quired  planar  rotation.  In  the  QRD-based  least  squares  minimisation  problem  these  rotations  also  have 
to  be  applied  to  various  vectors.  In  section  2.3  the  “natural"  implementation  of  the  Givens  rotation  (in¬ 
volving  the  computation  of  square-roots)  was  applied  to  various  vectors  in  a  feedforward  manner:  the 
two  components  of  the  rotated  vector  are  calculated  ind^ndently  on  the  basis  of  the  two  input  com¬ 
ponents.  The  square-root-free  algorithm  presented  above  applies  the  rotation  in  a  feedback  mode:  one 
output  is  now  dependent  on  one  input  and  the  other  output  It  follows  that  there  exists  another  feedback 
algorithm  where  the  parameter  that  is  fed  back  is  the  stored  quantity  f  -  see  figure  4  -  rather  than  the 
ceU  output 

Combined  with  the  two  methods  for  calculating  the  rckanon  parameters,  the  feedforward/feedback 
choice  results  in  six  different  vanations:  see  table  I .  The  computer  simulations  of  section  3.8  show  that, 
of  these  four  possible  variations,  the  one  that  is  equivalent  to  the  RMGS  error-feedback  algorithm 
(square-root-free  with  feedback)  performs  the  best  in  terms  of  numerical  stability.  It  is  worth  emphasis¬ 
ing  that  the  basic  architecture  (figure  2)  is  not  affected  by  the  choice  of  rotation  technique  so  that  the 
only  difference  between  the  various  options  is  a  change  of  PE’s. 

2.5  Direct  Residual  Extraction 

In  many  least  squares  problems,  and  parncularly  in  adaptive  noise  cancellation,  the  least-squares 
weight  vector  w(n)  is  not  the  principal  object  of  interest.  Of  more  direct  concern  is  the  corresponding 
residual,  since  this  constitutes  the  noise-reduced  output  signal  from  the  adaptive  combiner[32].  In  this 
section,  we  will  show  how  the  “a-posteriori"  least  squares  residual 

e(n.  n)  ■  x^(n)w(n)  +  y(n)  (27) 


which  depends  on  the  most  up-to-date  weight  vector  £(n).  may  be  obtained  directly  from  the  type  of 
processor  array  described  in  section  2.3  without  computing  w(n)  explicitly!  18]. 

In  order  to  proceed,  it  is  necessary  to  consider  the  structure  of  the  nxn  matrix  Ofn)  which, 
from  equation  (20).  may  be  expressed  in  the  form 


0<n) 


A(n)  O  »(n), 
O  I  9  j 
Ib’fn)  o^  Y(n): 


(28) 
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t.e.  ')<n)  IS  the  product  of  the  cosine  terms  in  the  p  rotations  represented  by  Q(n).  Multiplying  both  sides 
of  equation  (20)  by  0  (n)  >><<1  noting  that  the  matrix  0(n)  is  unitary,  it  is  now  easy  to  deduce  that 

x^(n)  =  a%)R(n)  (32) 

and  similarly,  from  equaaon  (21 )  we  have 

y(n)  «  a”(n)y(n)-i--f(n)a(n)  (33) 

Substituting  equations  (32)  and  (33)  into  equation  (27)  leads  to  the  expression 

e(n.  n)  -  »“(n)R(n)ar(n)  +  »”(n)u(n)  +  ■|((n)(x(n)  (34) 

and.  from  equation  ( 1 1),  it  follows  immediately  that 

e(n.  n)  *  -)(n)a(n)  (33) 

Now.  as  noted  in  section  2.3.  when  the  conventiona]  Givens  rotation  algorithm  is  employed.  dK 
parameter  a(n)  is  generated  quite  naturally  within  the  inangularisation  process  and  simply  emerges 


from  the  bottom  cell  of  the  right  hand  column  in  figure  2:  because  of  its  relationship  with  the  a-posteriori 
residual  e(o.  n).  as  given  in  equation  (33),  the  parameter  a(n)  ■$  known  as  the  angle-normalised  resid¬ 
ual.  The  scalar  y(n),  as  given  by  equation  (31),  may  also  be  computed  very  simply.  The  product  of  co¬ 
sine  terms  is  generated  recursively  by  the  parameter  y  as  it  passes  from  cell  to  cell  along  the  chain  of 
boundary  processors.  The  simple  product  required  to  fc»m  the  a-posteriori  residual  as  given  in  equation 
(35)  is  computed  by  the  final  processing  cell  F  in  figure  2. 

The  direct  residual  extraction  technique  obviously  avoids  a  lot  of  unnecessary  computation  pro¬ 
vided  that  the  weight  vector  is  not  required  explicitly.  As  a  result,  the  ova's!!  processing  architecture  is 
greatly  simplified.  Thae  is  no  need  for  a  separate  back-substimtion  processor  or  any  sophisticated  con¬ 
trol  circuitry  to  ensure  that  the  contents  of  the  triangular  atray  are  input  to  the  back-subshmtion  proces- 
sa  in  the  correct  sequence.  Consequently,  it  is  much  easia  to  maintain  a  regular  pipelined  data  flow.  A 
less  obvious,  but  arguably  more  impcatant,  advantage  of  direct  residual  extraction  is  the  improved  nu¬ 
merical  stability  which  it  offas.  This  derives  from  the  fact  that  computing  the  weight  vector  exphcitly 
requires  the  solution  of  a  linear  invase  problem  which  may  be  ill-condiboned  in  circumstances  wbae 
the  optimum  weight  vector  is  not  well  defined.  The  least  squares  residual,  however,  is  always  well  de¬ 
fined  and  can  be  computed  reliably.  Note,  for  example,  that  the  ccrrect  (zero)  residual  is  obtained  even 
dunng  the  first  few  processing  cycles  when  the  data  matrix  is  not  full  rank  and  the  corresponding  weight 
vector  cannot  be  uniquely  defined.  This  type  of  unconditional  stability,  which  may  be  contrasted  with 
that  of  the  traditional  RLS  algorithm  (see  Haykin  [12])  and  avoids  the  need  for  "persistent  excitation", 
IS  extremely  important  in  the  context  of  real  time  signal  processing. 

The  a-posteriori  residual  may  be  computed  in  a  similar  manner  when  the  square-root-free  algo- 
nthm  is  employed.  The  square-root-free  algorithm  delivers  from  the  bottom  cell  in  the  right-hand  col¬ 
umn  a  scalar  e(n)  given  by 


6'''^(n)e(n)  -  a(n)  (36) 

whae  6'^(n)  is  the  scaling  parameter  appropriate  to  the  p"*  row  at  time  n  and  it  has  been  assumed  that 
&(n)  is  initiahsed  to  unity  on  input  to  the  array.  From  equations  (33)  and  (36)  it  follows  that 

e(n,  n)  -  7(0)6' '^n)e<n)  (37) 

whae  7(n)  is  the  product  of  cosine  terms  which  arise  in  the  conventional  Givens  rotation  algorithm. 
Howeva.  6(n).  as  computed  by  the  boundary  processors  for  the  square-root-free  algorithm,  is  simply 
the  product  of  all  the  c  terms  and  it  can  easily  be  shown  that  this  is  equivalent  to  the  produa  of  the  con¬ 
ventional  cosine  parameters  squared.  Thus, 


6(n)  ••  r^fn) 

(38) 

and 

e(n.n)  ■  6(a)e(n) 

(39) 
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Hence  the  a-posteriori  residual  e(nji)  may  also  be  obtained  directly  irom  the  square-root-£ree  proc¬ 
essor  array,  using  a  final  multiplier  ceU  F  as  illustrated  in  figure  2. 

A  second  form  of  residual,  which  occurs  naturally  in  least  squares  algorithms  such  as  the  least 
squares  lattice  (see  Haykin  [12])  is  the  a  priori  residual,  denoted  e(n,  n  -  1).  Defined  in  terms  of  the 
previously  computed  weight  vector  w(n-l )  and  the  latest  data  [x^(n).  y(n)].  it  takes  the  form 

e(n,  n  -  1)  «  x^(n)w(n  -  1)  +  y(n).  (40) 

This  residual  may  also  be  obtained  directly  from  the  triangular  processor  array  as  we  now  show.  By  sub¬ 
stituting  the  decomposition  for  Q(n)  given  in  equation  (28)  in  the  time  update  relation  (equation  (20)) 
we  have 

b^(n)PR{n  -  1)  +  Y(n)xT(n)  =  p  (41) 

Eliminating  the  vector  x^(n)  between  equations  (40)  and  (41)  we  find 

e(n,n-l)»  (-b^(n)PR(n- l)w(n- l)  +  )(n)y(n))/’y<n)  (42) 

Again  by  using  the  decomposition  given  in  equation  (28).  this  ume  with  equation  (21).  an  other 
relauonship  can  be  obtained: 

b^(n)Pu(n- 1)  +  Y(n)y(n)  »  a(n)  (43) 

By  eliminating  the  term  Y(n)y(n)  between  equation  (42)  and  (43)  we  have  that 

e(n.n-l)“  (-b^(n)PR(n- l)w(n- l)  +  ci(n)-b^(n)Pu(n- 1))/Y(n)  (44) 

and.  from  equation  ( 1 1 ).  it  follows  immediately  that 

e(n.  n- 1)  ■  a(n)/Y(n)  (43) 

Thus  the  a  priori  residual  is  computed  from  the  same  quantities  as  the  a  posteriori  residual.  It  follows 
immediuely  from  equations  (33)  and  (43)  that  they  are  related  by  the  expression 

e(n.  n)  «  T(*(n)e(n.  n  -  1)  (46) 

Note  that  by  eliminating  the  term  ifn)  between  equations  (33)  and  (43)  we  find  that 

a(n)  ■  ./e(n.  n  -  l)e(n.  n)  (47) 
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so  that  the  angle  nonnalised  residual  a(n)  can  be  viewed  as  the  geometric  mean  of  the  a  priori  and  the 
a  posteriori  residuals. 


From  equations  (38),  (39)  and  (46)  we  see  that 

e(n)  =  e(n,  n- 1)  (48) 

so  that  the  scalar  which  emerges  naturally  from  the  bottom  cell  in  the  nght-hand  column  of  the  square- 
root-free  processor  array  is  the  corresponding  a  priori  residual.  Finally,  note  that 

e(n,  n) 

and  hence  we  see  that  5(n)  is  equal  to  the  so-called  likelihood  variable  (see  Haykin  [  1 2]). 

2.6  Weight  freezing  and  flushing 

It  has  been  shown  that  if  a  data  vector  [x^(n),  y(n)]  is  input  to  the  triangular  processor  array 
in  figure  2,  the  corresponding  a-posteriori  least  squares  residual  e(n,n)  emerges  from  the  final  cell  F.  In 
order  to  achieve  this  result,  the  array  performs  two  distinct  functions; 

( 1 )  It  generates  the  updated  triangular  matrix  R(n)  and  corresponding  vector  u(n)  (or  D(n),  R(n) 
and  u(n)  for  the  square-root-free  algorithm)  and  hence,  implicitly,  the  updated  weight  vector 
w(n). 

(2)  It  acts  as  a  simple  filter  which  applies  the  updated  weight  vector  to  the  input  data  according 
to  equation  (27). 

If  the  array  is  subsequently  "frozen"  in  order  to  suppress  any  further  update  of  the  stored  values,  but 
allowed  to  function  normally  in  all  other  respects,  it  will  continue  to  perform  the  filtering  operation 
without  affecting  the  implicit  weight  vector  w(n).  Thus,  in  response  to  an  input  of  the  form  [x^.  y],  the 
frozen  network  will  produce  the  output  residual 

e-j''^w(n)  +  y  (50) 

Equation  (30)  may  be  verified  directly  by  considering  the  frozen  array  as  a  combination  of  basic 
matrix  operators.  Consider  first  the  basic  triangular  array  ABC  in  figure 2.  In  frozen  mode,  the  boundary 
and  internal  cells  perform  the  reduced  processing  functions  defined  in  figure  3.  Now  consider  the  effect 
of  the  simplified  network  upon  a  row  vector  input  from  above  in  the  usual  manner.  This  will  give 
nse  to  a  column  vector  g  which  emerges  from  the  right.  It  is  straightforward  to  verify  that  the  input  vec¬ 
tor  s  is  related  to  the  output  vector  z  by  means  of  the  matrix  transformation 

X  ■  r’’^z  (51) 
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where  R  is  the  upper  triangular  matrix  stored  within  the  array.  For  example,  it  is  clear  that 
Z|  =  x,/r,,  and  =  (*2  “ '’12^1 )  "  ''22 

i.e  X|  =  r,,z,  and  Xj  =  ''12^1  ^22^2  (53) 

Assuming  that  R  is  non-singular  (i.e.  no  diagonal  element  of  R  is  zero)  it  follows  that 

z  =  R"'^x  (54) 

and  so  the  frozen  triangular  array  may  be  regarded  as  an  R~^  matrix  operator  as  depicted  schematically 
in  figure  6. 
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Square-root-free  Frozen  Processing  Elements 

Now  consider  the  nght  hand  column  of  cells  DE  in  figure  2.  Ii  is  easy  to  show  that,  in  frozen 
mode,  the  effect  of  this  array  upon  a  column  vector  x  input  from  the  left  and  a  scalar  y  input  from  the 
top  IS  to  produce  the  scalar  output  y  •  x^u  w  hich  emerges  from  the  bottom  cell.  The  vector  x  also  emerg¬ 
es  unchanged  from  the  right  as  depicted  in  figure  6.  It  follows  immediately  that  if  the  network  in  figure 
2  is  frozen  at  time  n.  its  effect  upon  a  vector  [x^.y]  input  from  the  top  is  to  produce  the  scalar  output 
y  -  x^R''(n)u(n)  which  emerges  from  the  column  of  internal  cells  DE.  From  equation  (1 1).  it  can  be 
seen  that  this  is  precisely  the  frozen  residual  defined  in  equation  (50). 

When  the  square-root-free  algorithm  is  employed,  the  array  m  figure  2  may  be  frozen  very  simply 
by  setting  the  forget  factor  P  =  1  and  initialising  the  parameter  6  to  zero  for  any  input  vector  which  is  to 
be  processed  in  the  frozen  mode.  As  pointed  out  in  section  2.4.  this  has  the  effect  of  assigning  zero 
weight  to  that  vector  within  the  overall  least  squares  computation  and  so  the  processing  does  not  affect 
any  values  stored  within  the  array.  This  property  can  be  venfied  quite  easily  by  inspecting  the  square- 
root-ftee  cells  in  figure  4  and  the  resulting  operations  for  the  frozen  cells  are  shown  in  figure  7.  It  fol¬ 
lows  from  the  discussion  above  that  if  a  vector  [x^.  yl  is  input  to  the  top  of  the  frozen  square-root-free 
array  at  time  n.  the  output  which  emerges  from  the  internal  cell  E  will  be  y  -  x^R”'(n)u(n)  and  from 
equation  (26)  it  can  be  seen  that  this  is  again  the  frozen  residual  defined  in  equation  (50).  Note  that  the 
parameter  for  the  final  processing  cell  must  be  set  equal  to  one  to  avoid  suppressing  the  output  re¬ 
sidual  from  the  frozen  square-root-free  network  if  this  technique  is  used. 

Having  established  that  the  function  of  a  frozen  triangular  array  is  given  by  equation  (50).  it 
is  easy  to  see  how  the  least  squares  weight  vector,  if  required,  may  also  be  obtained  without  performtng 
a  back-substitution.  Let  h,  denote  the  p  element  vector  whose  only  non-zero  element  occurs  in  the  i'*' 
location:  it  follows  that  the  effect  of  inputting  the  sequence  of  "impulse"  vectors  [b,.  0]  (i=1.2....p)  to 
the  frozen  array  is  to  produce  the  sequence  of  output  values  w.fn)  (i=I  .2....p)  and  so  the  weight  vector 
w(n)  may  be  extracted  from  the  array  without  the  need  for  any  additional  hardware.  This  technique, 
which  amounts  to  measuring  the  impulse  response  of  the  system,  is  generally  referred  to  as  "weight 
flushing"[.^2]. 
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2.7  Parallel  Weight  Extraction 

The  technique  of  weight  flushing  presented  in  section  2.6  requires  that  the  adaptive  filtering  be 
temporarily  suspended  whilst  the  sequence  of  impulse  vectors  is  fed  into  the  array.  A  p"'  order  system 
would  therefore  have  to  suspend  its  data  processing  for  p  time  instants.  Although  it  is  conceivable  that 
this  weight  flushing  process  could  be  carried  out  at  a  higher  clock  rate,  in  between  the  processing  of  the 
data,  in  a  high  data  rate  environment  this  option  is  unlikdy  to  be  viable  and  an  alternative  technique  is 
required.  As  we  show  bdow,  the  weight  veaor  can  be  produced  in  parallel  with  the  adaption  process, 
in  several  ways,  by  the  addition  of  extra  hardware. 

Before  describing  how  the  weight  vector  can  be  generated,  we  first  consider  an  important  property 
of  the  Q  matrix  upon  which  the  parallel  weight  extraction  techniques  (and  also  some  other  important 
techniques  like  MVDR  beamforming  (201)  depend.  We  have  seen,  in  section  2.2,  that  the  maoix  Q(n) 
updates  the  triangular  manix  R(n  -  1)  to  R(n)  by  rotating  in  the  new  data  vector  at  time  n.  Rather  sur¬ 
prisingly,  the  matrix  Q(n)  can  also  be  used  to  update  the  triangular  manix  R~*^(n  -  1)  to  R”*^(n).  Con¬ 
sider  the  following  identity: 


PR“(n- 1)  O  x'(nlO  (n)Q(n)  o  =I 


.  H 


p"*R"“(n-  1) 


Let 


p''R'”(n-l)  iT(n) 
0(n)  O  =  0 


z^(n) 


so  that,  by  means  of  equations  (20)  and  (55),  we  have; 

"T(n) 

R“(n)  Op  O  «  1 

,Z^(n) 


and  clearly. 


T(n)  -  R’”(n) 


Thus  equation  (56)  becomes; 


ip''R'"(n-l)  iR""(n) 

Q(n)  o  “  O 

o^  z^(n) 


(55) 


(56) 


(57) 


(58) 


(59) 
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and  we  see  that  the  same  rotations  that  update  the  mamx  R  also  update  the  matrix  R~^ 


Recall  from  the  description  of  the  QRD  systolic  array  in  section  2.3  that  the  sine  and  cosine  pa¬ 
rameters  that  define  the  rotation  matnx  0  are  passed  from  left  to  right  across  the  array.  Thus  the  matrix 
R"^  can  be  calculated  by  appending  an  array  of  internal  processing  elements  to  the  right  of  the  basic 
QRD  array:  the  QRD  array  stores  R  and  outputs  the  rotanon  parameters  that  specify  Q  as  R  is  updated 
from  time  instant  to  time  instant;  the  rotation  parameters  are  then  used  by  the  new  array  which  stores 
and  updates  the  matrix  R~”  by  rotating  it  against  a  vector  of  zeros.  The  matrix  R"”  is  lower  triangular 
and  hence  the  new  array  will  also  be  lower  triangular  (see  figure  8). 

Armed  with  this  property  of  the  Q  matrix,  we  can  now  show  how  the  weight  vector  can  be  calcu¬ 
lated  m  parallel  with  the  adaption  process  [29].  Consider  the  following  augmented  data  matrix: 


X^(n)  =  fx(n)  y(n) 


(60) 


The  upper-triangular  matrix  resulting  from  a  QR  decomposition  will  then  have  the  form: 


R.^(n) 


[R(n)  u(n) 

'  e(n)! 


(61) 


where  e(n)  =  y(n)!  is  the  square-root  of  the  filtering  error  power  fix’ y(n) .  It  is  easy  to  show  that  the 
inverse  of  an  upper  tnangular  matrix  is  another  upper  triangular  matrix;  thus  let 


R;‘(n) 


T'(n)  s(n)| 
0^  t(n)J 


(62) 
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where  T'(n)  is  a  (p+l)x(p+l)  upper  triangular  matrix.  s(n)  is  a  p-dimensional  vector,  and  t(n)  is  a  sca¬ 
lar.  Then 


I  =  R;‘(n)R^(n)  = 


T'(n)  s(n)  'R(n)  u(n) 
t(nX  ^  0^  £(n) 


;T'(n)R(n)  T'(n)u(n)  +  e(n)s(n) 

'  t(n)e(n) 


(63) 


hence 


R'Vn)  -e''(n)R‘'(n)u(n),  ^  R“'(n)  E''(n)w(n) 

i  e”'(n)  I  '  0^  e~'(n)  j 


(64) 


where  we  have  used  equation  (1 1 )  to  identily  the  presence  of  the  least  squares  weight  vector  w(n) ,  Thus 
we  can  obtain  the  weight  vector  w(n)  directly  from  the  right-hand  column  of  the  inverse  of  the  augment- 
ed  triangular  matrix  R^(n)  (or  equivalently,  the  complex  conjugate  of  the  bottom  row  of  R^  (n)). 

“H 

The  matn  x  R^  (n)  can  be  calculated  as  shown  in  figure  8  provided  we  ensure  that  the  QRD  array, 
on  the  left-hand  side  of  figure  8,  is  solving  the  augmented  problem.  This  requires  that  the  data  vector 
being  fed  into  the  array,  at  time  n,  is  8  (n)  y(n)  and  hence  that  the  QRD  array  is  now  of  dimension 
(p-rl)x(p-(-l).  The  adaptive  filtering  residual  is  still  available  with  this  architecture  since  the 
(p+ 1  )x(p+ 1 )  QRD  array  can  be  thought  of  as  consisting  of  a  p"’  order  adaptive  filter  processor  (as  shown 
in  figure  2)  with  the  multiplier  cell  (F  in  figure  2)  replaced  by  an  additional  circular  processing  element. 
Clearly  this  additional  circular  processing  element  can  be  modified,  if  necessary,  to  calculate  the  adap¬ 
tive  filtering  residual. 

The  parallel  weight  extraction  technique  described  above  is  based  on  the  fact  that  the  rotation  ma- 
tnx  Q(n) .  as  calculated  by  the  basic  QRD  array  on  the  left-hand  side  of  figure  8,  can  be  used  to  update 
the  matrix  R  (n  -  1).  An  alternative  technique  for  generating  Q(n)  [1][21][22]  leads  to  a  different  ar¬ 
chitecture  for  calculating  the  optimum  weight  vector.  This  technique  rehes  on  the  fact  that  the  rotation 
matrix  <3(n)  is  completely  specified  by  the  relevant  p  rotation  angles  (see  section  2.2)  and  hence  may 
be  reconstructed  from  knowledge  of  these  angles.  As  we  show  below,  the  relevant  angles  can  be  recov¬ 
ered  from  the  bottom  row  of  the  matrix  O(t'):  further  more,  this  row  vector  can  itself  be  calculated,  in¬ 
directly,  based  on  knowledge  of  the  matrix  R””(n  -  1). 

From  equations  (28)  and  (29),  we  have 

j  b(n) 

0  (n)n„  =  p  (63) 

Tf(n) 
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(66) 


and  it  is  easy  to  show  that 

g,  =  ±Y  c  =  ±c,  s  =  ±s,  (69) 

The  ambiguity  of  sign  is  immediately  resolved  by  recalling  that  in  the  constnicbon  of  Qp(n)  the  rotation 
angle  is  always  chosen  such  that  the  cosine  is  positive.  Having  applied  the  first  rotation  to  determine  S] 
and  Cj.  the  vector  in  equation  (66)  takes  the  form: 

io.  -s*  n  c, 

i  •  3 

and  so  the  next  Givens  rotation  serves  to  compute  S2  and  Cj  in  a  similar  manner,  and  so  on.  Note  that  it 
is  also  possible  to  recover  the  rotation  angles  that  define  Op(n)  front  its  right-hand  column.  In  particular, 
we  have 
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s 


Qp(n)Tt„ 


CjSj 

;a(n) 

P-1 

0 

[Y(n) 

i  I 

!  0  < 

1  1 

P  1 

(71) 


so  that  the  sines  and  cosines  may  be  recovered  by  implying  a  sequence  of  Givens  rotations  which  anni¬ 
hilated  the  elements  of  the  vector  a(n)  starting  from  the  bottom  and  working  upwards.  In  this  case  the 
pair  (Sp,  Cp)  are  generated  first  and  (s,,c,)  last. 

Having  established  that  the  matrix  Q(n)  can  be  reconstructed  from  knowledge  of  the  quantibes 
bfn)  and  yin),  we  now  proceed  to  shown  how  these  two  quantities  can  be  calculated  other  than  from 
Q(n)  itself.  Note  from  equation  (41)  that 

b(n)  = -p~'Y(n)R'%- l)x(n)  (72) 

Thus  given  R"^(n  -  1)  and  the  new  data  at  time  n  ( x(n) ),  we  may  use  equation  (72)  to  calculate  the 
vector  b(n)/7(n).  The  value  of  -yfn)  can  easily  be  found  from  the  fact  that  Q(n)  is  orthonormal  and 
hence 


bfnl^  +  y'fn)  =  1 

(73) 

or 

j  -1/2 

7<n)  =  [  b(n)/T<n)  +  1] 

(74) 

Having  calculated  b(n)  and  yfn)  by  this  indirect  method,  we  can  then  calculate  Q(n)  as  above  and 
proceed  to  update  the  matrix  R~”,  without  the  need  to  explicitly  calculate  R.  Oearly.  the  augmented 
data  matrix  R~^(n)  can  also  be  updated  in  this  way  (with  suitably  redefined  quanbties)  thus  allowing 
the  weight  vector  w(n)  to  be  calculated  without  the  need  for  the  triangular  array  that  performs  the  QR 
decomposition  of  the  augmented  data  matrix.  Once  the  optimum  weight  vector  is  known,  the  adaptive 
filtering  residual  could  of  course  be  calculated  using  equation  (27)  but,  since  the  weight  vector  may  not 
be  well  defined  this  method  is  less  robust  than  the  direct  extraction  one  (see  section  2.5). 

Another  method  for  extracting  the  weight  vectcv  without  interfering  with  the  adaption  process  also 
relies  on  the  strucmre  of  the  0  matrix.  From  equation  (72)  we  see  that  if  a  vectw  x^(n)  is  annihilated 


21 


r 


by  Givens  rotanons  against  a  triangular  matrix  PR(n  -  1)  then  the  bottom  row  of  the  0  matrix  contains 
the  term  P"'R”^(n-  l)x(n).  Now  from  equation  (11)  we  have  that: 

w(n)  *  -R'*{n)u(n)  (75) 

X  X 

Hence  if  we  use  Givens  rotations  to  annihilate  the  vector  u  (n)  by  rotation  against  the  matrix  R  (n) 
then  the  bottom  row  of  the  composite  rotation  matrix  will  contain  the  term 

R‘‘(n)u(n)  =  -w(n)  (76) 

i.e.  the  negative  of  the  least  squares  weight  vector.  A  systolic  array  for  implementing  this  algorithm  can 
be  found  in  reference  [31]. 

2.8  Comparison  with  Recursive  Modified  Gram-Schmidt  Algorithms 

The  square-root-free  algorithm  and  architecture  with  direct  residual  extraction  as  described  in  sec¬ 
tions  2.1  to  2.5  was  obtained  independently  by  Ling.  Manolakis  and  Proakis  [15]  based  on  the  method 
of  modified  Gram-Schmidt  (MGS)  onhogonalisahon.  The  MGS  algorithm  operates  on  a  fixed  block  of 
data  and  is  essentially  non-recursive.  It  solves  the  least  squares  problem  by  applying  a  sequence  of  hn- 
ear  combinations  to  the  columns  of  the  block  data  matrix  X(n)  and  transforming  it  into  a  matrix 
X’  (n)  whose  columns  are  mutually  orthogonal.  The  comply  transformation  may  be  represented  by 
means  of  an  upper  triangular  matrix  whose  elements  cotrespond  to  the  triangular  matrix  R  (n)  which 
would  have  resulted  from  applying  a  QR  decomposition  to  the  dau  matrix  X  (n) .  The  process  is  ex¬ 
tended  to  include  the  vector  of  samples  y(n)  in  the  primary  channel  and  hence  extract  from  it  the  com¬ 
ponent  which  is  orthogonal  to  the  columns  of  X  ( n) .  The  bottom  elemem  of  the  resulting  vector  then 
corresponds  to  the  a-posteriori  least  squares  residual  at  time  n.  The  QR  decomposition  and  MGS  tech¬ 
niques  are  clearly  related  except  that  the  former  applies  an  orthogonal  transformation  to  the  data  matrix 
X  ( n)  in  order  to  produce  an  upper  triangular  form  whereas  the  latter  applies  an  upper  triangular  matnx 
transformation  to  X  (n)  in  order  to  produce  a  matrix  with  orthogonal  columns.  Also,  the  QRD  tech¬ 
nique  may  be  applied  in  a  convenient  row-recursive  maimo'  using  a  sequence  of  elementary  Givens  ro¬ 
tations.  The  most  importam  contribution  of  Ling.  Manolakis  and  Proakis  was  to  show  how  the  MGS 
algorithm  could  also  be  updated  one  row  at  a  time  thereby  generating  the  recursive  modified  Gram- 
Schmidt  (RMGS)  algorithm  which  may  be  implemented  using  a  triangular  processing  architecture  sim¬ 
ilar  to  that  in  figure  2. 

Motivated  by  the  desire  to  update  the  stored  triangular  matrix  elements  directly  rather  than  as  a 
ratio  of  otha  updated  terms  which  occur  more  naturally  in  the  RMGS  approach,  they  manipulated  their 
algorithm  funher.  The  resulting  update  equations  were  found  to  involve  an  important  elemem  of  feed¬ 
back  which,  leeds  to  an  improvement  in  numerical  stability.  It  is  interesting  to  note  that  the  error  feed¬ 
back  RMGS  algorithm  derived  by  Ling.  Manolakis  and  Proakis  [15]  is  identical  to  the  form  of  square- 
root-free  Givens  rotation  algorithm  defined  in  figures  2  and  4  [14].  It  was  this  relationship  which  led  us 
to  refer  to  the  basic  operation  in  figure  4  as  the  square-root-free  with  feedback  (SF/FB)  Civets  rotation 
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and  to  identify  the  underlying  feedback  mechanism  which  is  inherent  to  it. 

2.9  Comparison  with  Kalman  filter  algorithms 

The  QR  decomposition  algorithms  for  recursive  least  squares  processing  as  defined  in  figures  2. 3 
and  4  operate  directly  on  the  basic  data  matrix  X  (n)  as  opposed  to  the  data  covariance  matrix  M(n) 
and.  as  noted  previously,  this  has  significant  numerical  advantages.  It  was  also  shown  in  equation  (8) 
that  the  upper  triangular  matrix  R  ( n)  which  is  stared  and  updated  within  the  processor  array  is  analyt¬ 
ically  identical  to  the  Cholesky  square-root  factor  of  the  data  (or  information)  covariance  matrix.  In  the 
nomenclature  of  Kalman  filtering,  the  QRD  algcsithm  constinites  a  numerically  stable  form  of  square- 
root  information  Kalman  filter  with  unit  state-space  matrix. 

It  has  recently  been  shown  by  Chen  and  Yao  [3]  how  the  algorithm  and  triangular  array  architec¬ 
ture  may,  in  fact,  be  extended  to  the  case  of  a  general  square-root  information  Kalman  filter  with  arbi¬ 
trary  state  space  matrix.  Gaston  et  al  [6]  have  also  shown  bow  the  triangular  QRD  array  may  be  used  as 
the  core  processor  in  a  general  square-root  covariance  Kalman  filter.  Note  that  in  Kalman  filtering  no¬ 
menclature  the  term  "covariance”  does  not  refer  to  forming  or  using  the  data  covariance  matrix  M  ( n) . 
Instead,  it  denotes  an  algorithm  which  is  based  on  updating  the  matnx  M~'  (n)  (or  its  Cholesky  square 
root  factor  R“ '  ( n)  in  the  present  context)  since  this  specifies  the  covariance  of  the  weight  vector  esti¬ 
mate.  In  the  special  case  of  unit  state-space  matrix  the  covariance  Kalman  filter  reduces  to  a  least 
squares  estimation  algorithm  which  makes  use  of  the  well-known  matrix  inversion  lemma  to  perform 
successive  rank-one  updates  of  the  matrix  M~*  (n) .  This  is  often  referred  to  as  the  “Recursive  Least 
Squares"  (RLS)  algorithm  although  it  is  fundamentally  different  from  the  QRD-based  recursive  least 
squares  technique  described  in  this  memorandum.  For  example,  as  is  often  pointed  out  "persistent  ex¬ 
citation"  is  essential  if  the  traditional  RLS  algorithm  is  to  retain  numerical  stability  due  to  the  1  term 
which  occurs  in  the  associated  Riccati  equation.  However  this  problem  does  not  arise  with  the  QRD  al¬ 
gorithm  described  in  sections  2. 1  to  2.3.  It  would  seem  sensible,  then,  to  refer  to  the  traditional  RLS  and 
the  alternative  QRD  techiuques  as  the  “covariance"  and  “square-root  information"  recursive  least 
squares  algorithms  respectively. 

Finally,  it  is  worth  noting  that  several  authors  (1][21][22]  have  recently  developed  stable  "square- 
root  covariance"  least  squares  algorithms  and  architectures  based  on  the  technique  illustrated  in  figure 
8  for  updating  the  “square-root  covariance"  matrix  R~'  (n) .  Their  technique,  however,  makes  clever 
use  of  a  pinning  vector  to  avoid  storing  and  updating  the  “square-root  information"  matrix  R  (n)  ex¬ 
plicitly  and  only  requires  a  single  triangular  processor  array  as  described  in  section  2.7.  It  is  not  clear 
how  their  method  would  extend  to  the  general  “square-root  covariance"  Kalman  filtering  proMem  or 
how  it  relates  to  the  "square-root  covariance"  Kalman  filter  architectures  proposed  by  Gaston  et  al  [6]. 

3  Adaptive  FIR  Filtering. 

3.1  The  ORO  Approach. 

In  section  2  we  saw  how  an  adaptive  linear  combiner  could  be  applied  to  the  problem  of  narrow- 
band  adaptive  beamforming.  The  same  linear  combiner  could  be  used  to  construct  an  adaptive  FIR  fil- 
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ter.  In  this  case,  the  combined  output  at  time  tj  is  given  by  the  equation 


at) 


_x’^(tj)w  +  y(tj) 


p-l 


j.O 


(77) 


which  is  identical  to  the  narrow-band  beamformer  case  (equation  (]»  except  that  the  input  vector  x(t) 
now  exhibir«  4  high  degree  of  time-shift  invariance.  This  property  manifests  itself  in  the  fact  that  the 
“data  matrix"  (Xp(n)  of  equation  (79))  has  a  Toeplicz  structure  i.e.  each  row  of  the  matrix  is  obtained  by 
shifting  the  previous  row  one  column  to  the  right  and  introducing  one  new  data  sample.  Various  algo¬ 
rithms  have  been  devised  that  take  advantage  of  this  redundancy  and  so  reduce  the  computational  load, 
for  a  p***  order  filter,  from  O(p^)  to  0(p)  arithmetic  operations  ptf  sample  time  (see  Haykin  [12]).  The 
common  basis  for  these  fast  algorithms  is  an  efficient  technique  for  solving  the  least  squares  linear  pre¬ 
diction  problem.  The  concepts  of  forward  and  backward  linear  prediction  must  both  be  introduced  for 
this  purpose.  The  adaptive  filtering  problem  can  then  be  solved  using  quantities  already  calculated  dur¬ 
ing  the  linear  prediction  stages.  Unfortunately  the  majority  of  these  fast  algorithms  exhibit  some  form 
of  numerical  instability  although  much  work  has  been  done  to  overcome  the  numerical  problems  and 
various  rescue  procedures  have  been  developed  -  see  [12]. 

As  we  have  seen,  it  is  possible  to  solve  a  least  squares  minimisation  problem  using  the  technique 
of  QR  decomposition.  Extensive  computer  simulations  of  this  algonthm[32]  have  shown  the  QRD- 
based  least  squares  minimisation  algorithm  to  have  excellent  numerical  properties.  However,  since  the 
recursive  QRD  algorithm  presemed  in  section  2  is  designed  to  solve  a  general  recunive  least  squares 
minimisation  problem,  it  requires  O(p^)  operations  per  sample  time  to  generate  the  solution  to  a  p***  or¬ 
der  adaptive  filter  problem.  A  QRD-based  algorithm  which  is  designed  for  the  special  case  of  adaptive 
filtering  and  only  requires  0(p)  tolerations  per  sample  time  is  thus  of  considerable  interest 
[13][17][26][33].  In  order  to  simplify  the  analysis  we  consider  only  real  signals.  The  extension  to  the 
complex  case  is  straight  forward  and  indeed  the  algorithms  presented  in  the  appendix  are  for  complex 
signals. 

In  a  least  squares  adaptive  filter  of  order  p.  the  set  of  p  weights’  Wp(n)  at  time  n  is  chosen  in  order 
to  minimise  the  sum  of  the  squared  differences  between  a  reference  signal  y(n)  and  a  linear  combination 
of  the  p  samples  from  a  data  time  series  x(t„.i)  (0  $  i  $  p-l ).  Specifically,  the  measure  to  be  minimised 
is  fiep(n)ll  where: 


«p(")  •  Xp(n)Wp(n)  +  y(n) 


(78) 


I .  in  this  section  we  append  a  subscnpt  to  all  variables  to  indicate  the  order  of  the  problem  being  solved 
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(79) 


lx(l)  x(0) 

Xp(n)  =  B(n) 


x(2-p)  " 
x(3-p)  ! 


ix{n)  x(n-  1)  ...  x(n-p+  1) 


and 


y(n)  =  B(n)J'(l)  y(n)]^  (80) 

Compared  with  equations  (5)  and  (6).  we  see  that  equations  (78)  to  (80)  constimte  a  standard  least 
squares  minimisation  problem  except  for  time-shift  invariance  of  the  data.  We  could  therefore  proceed 
to  solve  this  least  squares  minimisation  problem  via  the  QR  decomposition  technique  described  in  sec¬ 
tion  2.1.  In  order  to  do  so  we  must  determine  an  orthogonal  matrix  Qp(n)  that  transforms  the  matrix 
Xp{n)  into  upper  triangular  form^  and  use  the  same  matrix  Qp(n)  to  rotate  the  reference  vector  j;(n).  i.e. 

Qp(n)Xp(n)=  (81) 

and  Qp(n)3((n)«  (82) 

Vp(n) 

It  should  be  noted  that  once  Qp(n)  has  been  found,  the  filtering  problem  has  effectively  been 
solved  Knowledge  of  Qp(n)  means  that  y^fn)  is  known.  It  also  allows  the  angle  normalised  residual 
a  (n)  -  the  last  component  of  the  vector  y^fn)  -  to  be  calculated  and  thus  the  least  squares  residual  may 
be  found  (see  section  2.5).  The  O(p^)  QRD-based  algorithm  for  the  solution  of  a  p"*  order  least  squares 
minimisaaon  problem  as  described  in  section  2  has  many  desirable  features.  It  operates  in  the  “data  do¬ 
main"  and  has  a  time-recursive  formulatitm  with  time-independent  computational  requirement  and  a 
regular  parallel  architecture.  However  the  time  shift  redundancy  in  the  adaptive  filtering  problem  can 
be  used  to  improve  the  method  further  by  reducii^  the  computational  load  from  O(p^)  to  0(p).  The  de- 
vdopmem  of  fast  QRD  algorithms  for  adaptive  filtering  is  based,  almost  entirely,  on  the  principle  of 
constructing  partially  triangularised  matrices  from  known  quantities  and  then  finding  a  set  of  rotations 
to  complete  the  process.  We  recall  that  this  was  also  a  key  dement  in  the  derivation  of  the  time-recursive 
dgorithm  in  section  2.2.  The  solution  at  time  n  was  generated  by  rotating  the  new  input  data  for  time  n 
into  the  upper  triangular  matrix  associated  with  the  solution  at  time  (n- 1 ). 

Recall  that  the  set  of  rotations.  Qp(n).  required  to  solve  the  adaptive  filtering  problem  are  entirely 
dependem  on  the  matrix  Xp(n).  The  matrix  Xp(n)  can.  however,  be  built  up  in  an  order  recursive  manner 

2  In  the  following,  we  use  shading  in  order  to  emphasise  the  structure  of  the  non-zero  elements  of  a 
matrix 
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by  adding  extra  columns  which,  because  of  its  Toeplitz  structure,  consists  of  one  new  element  and  a 
time-shified  version  of  the  previous  column.  Consider  the  following  decompositions: 


'x{l)...  x(2-p) 

Xp(n)-B(n)  ;  (83) 

!x(n)  ...  x(n-p+  1)^ 


P"“'x(i) 

Xp.,(n-1) 


where^  ^p-  “  B(n-  1)  x(2) . x(n)  ^  (86) 

y^_,(n)  -  B(n)  x(-p  +  2) . x(n-p+l)^  (87) 

and  z  -  pn  -  >  'x(O) . x(-  p  +  2)  ^  (88) 


Note  from  equation  (84).  that  if  we  had  already  determined  the  rotation  matrix  Qp.jln)  which  triangu¬ 
larises  the  matrix  Xp.|(n).  we  could  use  it  to  partially  triangulanse  the  matrix  Xp(n).  In  doing  so  we 
would  also  have  to  rotate  the  vector  y^  _  ^fn)  but  these  are  exactly  the  steps  required  in  the  QRD-based 
solution  of  the  (p- 1 )”  order  backward  linear  prediction  problem. 

In  the  (p- 1 )“  order  backward  linear  prediction  problem  at  time  n.  an  estimate  of  x(n-p+ 1 )  is  formed 

from  a  linear  combination  of  the  data  f  x(n) . x(n-p4-2)).  The  solution  to  this  problem  depends  on  the 

triangularisation  of  the  matrix  Xp.|(n)  and  the  transformabon  of  the  reference  vector  y^  |(n).  Hence, 
knowing  the  solution  to  the  (p-l)"  order  backward  problem  at  time  n  would  allow  us  to  ccmstruct  the 
partially  triangularised  matrix  Qp_  ,(n)Xp(n)  from  known  quantities  and  thus  save  a  large  amount  of 
computation.  This  partially  triangularised  matrix  could  then  fc  transformed  imo  the  triangular  matrix 
Rp(n)  by  a  sequence  of  Givens  rotations. 

Equation  (83)  allows  another  partially  triangularised  version  of  Xp(n)  to  be  constructed,  this  time 
using  quantities  from  the  (p-l)”  order  forward  linear  prediction  problem.  The  (p-l)"  order  (forward) 

3  Note  thtt  the  subscript  ‘p’  attached  to  the  veaor  y^  is  superfluous  and  that  y^  ■  yp  ^  etc  Its  use 
is  merely  to  preserve  symmetry  with  the  vector  y^  for  which  the  subscnpt  is  necessary 
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Figure  9  QRD-Based  Lattice  Algonthm 


linear  prediction  problem,  at  time  n.  is  defined  as  the  esumation  of  x(n)  based  upon  the  data 
{x(n-l) . x(n-p+l)l.  This  involves  the  tiiangularisation  of  the  matrix  Xp.|(n-1)  and  the  transforma¬ 

tion  of  the  relevant  reference  veaor  y'  _  |(n).  We  could  then  use  the  decomposition  given  in  equation 
(83)  to  generate  a  partial  triangularisaaon  of  the  matrix  X^In)  from  known  quantities.  The  formation  of 
the  triangular  matnx  Rp(n)  could  then  be  achieved  very  easily  using  a  sequence  of  Givens  rotations. 

It  is  clear  that  the  two  linear  prediction  problems  of  order  (p-l)  are  intimately  connected  to  the 
problem  of  determining  a  minimum  set  of  rotations  which  produce  the  triangular  matnx  Rp(n).  Howev¬ 
er.  the  triangulansaoon  of  Xp(n)  is  central  not  tmly  to  the  adaptive  filtering  problem  but  also  to  the  p"* 
order  linear  prediction  problems.  The  rotations  which  transform  the  matrix  Xp(n)  into  Rp(n)  are  used 
to  solve  the  forward  problem  at  time  (n^-l)  and  the  backward  problem  at  time  n.  With  a  suitable  time 
delay,  we  can  therefore  construct  an  order  recursive  algorithm  for  linear  prediction  and  adaptive  filter¬ 
ing  (figure  9). 

It  IS  also  possible  to  develop  another  type  of  QRD-based  fast  algorithm.  From  the  discussion  above 
it  is  clear  that  we  have  a  fast  method  for  transforming  Rp  _  ,(n  -  1)  into  Rp(n)  via  equation  (83).  This 
technique  allows  us  to  transform  the  matrix  Qp  _  ,(n  -  1 )  into  Qp(n)  (i.e.  we  have  a  time  and  order  up¬ 
date).  Equation  (84)  constitutes  a  method  for  transforming  Rp(n)  imo  Rp  _  |(n)  and  hence  Qp(n)  into 
Qp  _  ,(n)  (i.e.  an  order  down-date).  Thus  by  combining  these  two  transformations  we  can  achieve  an 
overall  time  update  for  the  rotation  matiix  Qp  (figure  10).  This  iransfonnation  does  not  lead  duectly  to 
the  construction  of  a  fast  algorithm  since  having  to  calculate  the  various  Q  matrices  explicitly  would 
require  Oip^)  operations  and  does  not  represent  a  reduction  in  the  computational  requirement.  However 
we  have  already  seen  that  the  value  of  this  matrix  can  be  inferred  from  knowledge  of  its  right-hand  col¬ 
umn  (section  2.7).  Gearly  the  transformations  that  update  the  matrix  Qp  will  also  perform  a  time  update 
for  this  column  vector.  Then,  because  we  ve  now  dealing  with  a  vector  rather  than  a  matrix,  the  number 


of  computations  required  to  perform  this  type  of  update  is  0(p)  and  a  “fast"  algorithm  results. 

Traditionally  "fast"  adaptive  filtenng  algorithms  fall  into  two  classes:  the  least-squares  lattice  al¬ 
gorithms  and  the  "fast  Kalman"  algorithms.  Both  types  of  algorithm  solve  the  p"*  order  linear  prediction 
problem  in  0<pl  operations.  The  least-squares  lattice  algorithms  do  this  by  solving  all  of  the  lower  order 
problems  in  sequence,  whereas  the  "fast  Kalman"  algorithms  concentrate  on  a  problem  of  given  order 
and  achieve  the  reduction  in  computational  load  by  using  the  time  and  order  update  /  order  downdate 
technique.  Not  surprisingly,  the  two  classes  of  algorithms  have  tended  to  be  quite  different;  the  fast  Kal¬ 
man  algonthms  usually  calculate  the  transversal  filter  coefficients  explicitly,  whereas  the  lanice  algo- 
nthms  deal  directly  with  the  filter  residuals  (errors)  and  calculate  reflection  coefficients.  The  fast  Kal¬ 
man  algonthms  also  tend  to  require  fewer  arithmetic  operations  than  the  lattice  algorithms  although 
both  are  linear  in  the  problem  order.  The  level  at  which  the  two  different  algonthms  can  be  pipelined  is 
different  -  the  lattice  algorithms  having  a  higher  degree  of  concurrency  It  is  also  worth  noting  that  the 
data  downdating  step  which  is  implicitly  required  by  the  fast  Kalman  algonthms  gives  cause  for  concern 
with  regard  to  numerical  stability. 

Based  on  the  above  classification,  we  refer  to  the  two  fast  algonthms  derived  in  this  memorandum 
as  the  QRD-based  least-squares  lattice  algonthm  (figure  9)  and  the  QRD-based  fast  Kalman  algorithm 
(figure  10).  It  should  be  noted,  however,  that  this  latter  algonthm  does  not  calculate  the  transversal  filter 
coefficients  explicitly;  instead  it  generates  the  ra)uired  filter  output  using  the  QRD-based  method  of 
"direa  residual-extraction"  discussed  in  section  2.3.  The  QRD-based  fast  Kalman  algorithm  is  also  un¬ 
usual  in  that  it  quite  naturally  produces  the  solution  to  all  lower  order  problems  whereas  fast  Kalman 
algorithms  are  usually  seen  as  being  of  “fixed  order".  This  property  is  a  natural  consequence  of  using 
the  QRD  teclmitpie  (see  section  3.6).  We  begin  the  detailed  derivation  of  these  fast  algorithms  by  con¬ 
sidering  the  problem  of  determining  an  efficient  method  for  the  solution  of  the  order  forward  linear 
prediction  problem. 


The  p"*  order  forward  linear  prediction  problem,  at  time  n.  requires  the  determination  of  the  vector 
of  filter  coefficients  w^fn)  ■  . '*'p.p-i^"^J^  ^  minimises  the  total  prediction  error 
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ep(n)  where 


ep(n)  =  Xp(n- I)Wp(n)+yf(n)  (89) 

with  Xp  ( n  -  1 )  and  ( n)  as  defined  in  equations  (83)  and  (86)  respectively.  In  order  to  solve  this 
least  squares  problem  via  the  QR  decomposition  technique  we  have  to  determine  the  rotation  matrix 
Qp(n-l)  that  triangularises  the  data  matrix  Xp(n-1)  and  then  apply  it  to  the  vector  y^(n)  in  ordo’  to  cal¬ 
culate  the  angle  normalised  residual  ap(n)  (cf.  equation  (21)).  We  also  need  to  be  able  to  calculate 
y^(n  -  1)  in  order  to  generate  the  a-posteriori  prediction  residual  (see  equation  (33)).  Note  also  that  the 
tnangularisation  of  Xp(n- 1 )  is  exactly  what  we  require  in  the  solution  of  the  p**"  order  adaptive  filtering 
problem  at  time  (n-1)  -  see  equation  (78).  Consider,  therefore,  the  following  composite  mauix: 

Ml  =  yf(n)  Xp(n- 1)  y(n- 1)  (90) 

From  equations  (14).  (19)  and  (28).  we  have  that 


T 

Qp(n-l)rt„.,  =Qp(n-l)n„_,  =  a^(n  -  1).  o'^.  Yp(n  -  1) 


(91) 


It  should  be  clear,  therefore,  that  the  vector  JSn-i  in  the  above  matrix  (equation  (90))  will  enable  us 
to  calculate  yp(n  -  1 )  just  as  the  vector  y  ^ (n)  allows  a^In)  to  be  calculated.  Similarly  the  presence  of 
the  vector  y(n- 1 )  will  allow  us  to  calculate  a^fn  -  1 ) . 

Now  from  equations  (84)  and  (86)we  have  that 

yp_i(")  \-i<"~ ')  Xp- "n-i 


where  we  have  used  the  faa  that  y'(n)  >  y^_  i(n)in  order  emphasise  the  appearance  of  the  lower  order 
problem  in  the  decomposition  of  the  matrix  M].  Hence 


Qp.|(n-1)M, 


Up.  ,(n)  Rp_  ,(n-  1)  ,(B - 1)  Up.  ,(n -  D  Up.  ,(n -  D 

;yp-i(">  O  3^- tC““l>yp-i(n-»gp., ^ 


(93) 


-T 

where  ?p-i<"" “  iP^-Yp.,(n- •) 

It  is  easy  to  show  that  yp  _  ,(n)  and  Yp.  ,(n  -  1)  must  have  a  time  recursive  decomposition  sim¬ 
ilar  to  that  given  in  equation  (21)  for  yp  _  |(n  - 1) .  Hence 
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(95) 


,(n)  Rp-i(n- 

•1)  Up_,(n-1)  9p-,(n-l) 

(n-1) 

0 

Pyp_l(n--2)Pyp.,(n-2)  o 

i(n) 

o"^ 

Op., (n-1)  Yp_,(n-1) 

Now  suppose  that  we  had  already  calculated  a  rotation  matrix^  Qp(n  -  1 )  say.  that  rotates  the  vec¬ 
tor  _  j  (n  -  2)  into  a  form  where  only  the  top  element  is  non-zero  i.e. 

!  li 


,(n) 

.,(n- 

J)  u^,(n- 

1)  9p-l 

i(n- 

1) 

1 

C 

1 

a. 

1) 

(n- 

1) 

0^ 

,(n- 

-2) 

,(n- 

•2) 

0 

P  1 

1 

P  * 

=  Mj  (96) 

1) 

0 

0 

,(n- 

■2) 

0 

,(n) 

0^ 

»)  «p- 

i(n- 

1) 

Yp-i("- 

1) 

'  quantities  Pp . 

.,(n- 

-,(n-2). 

,(n 

-  2)  and  E 

,b 

P- 

i(n  -  2)  are  defined 

by  this  operation  and  we  note,  by  analogy  with  equation  ( 1 2).  that  _  j(n  -  2)  is  the  square-root  of  the 

(p- 1 )“  order  backward  prediction  energy  at  time  (n-2). 

Now  m  order  to  complete  the  triangulansation  of  the  matrix  Xp(n- 1 )  (see  equation  (92))  all  that  is 
required  is  the  anmhilation  of  the  single  element  a^_  |(n  -  1).  This  can  be  earned  out  using  a  single 
Givens  rotation: 


Op(n)  Mj 


“p-i<">  Rp-i<"- 

l)Bp_,(n-5) 

Wp-i(n-  >) 

ap-,(n-') 

Rp-l^")  9"^ 

Rp-,<"-') 

ap(n-  1) 

Pdf_,(n-1)  O 

0 

P0p.,(n-2) 

0 

oip(n)  o^ 

0 

ap(n-l) 

Yp(n-l) 

(97) 


Up(n)  Rp(n - 1)  Up(n  -  I)  ap(n  -  1) 
O  yp(n-l)gp(n-l) 


(98) 


4  The  notation  for  the  rotation  matnees  introduced  in  this  memorandum  are  somewliat  arbitrary  in  order 
to  solve  the p'*  order forwardUnui  prediction  problem,  we  must  annihilate  quantities  from 
order  backward  predicbon  problem.  Following  the  nomenclature  used  for  reflecbon  coefficieits,  the 
rotauon  matnees  used  here  are  labelled  according  to  the  problem  to  which  they  relate  rather  than  the 
quanbbes  they  vmihitate. 
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where  ap(n  -  1)  is  defined  by  this  operation.  The  identity  in  equation  (98).  and  hence  the  labelling  of 
some  of  the  elements  in  the  bottom  row  of  the  right-hand  matrix  in  equation  (97),  follows  by  definition 
(see  equation  (92)).  Note  from  the  above  that  the  "new”  quantities  frp_  [(n  -  1)  and  dp_  |(n  -  2).  in¬ 
troduced  in  equation  (96).  are  equivalent  to  existing  variables.  Indeed,  from  equations  (97)  and  (98)  we 
have  that 


so  that,  see  equation  (21). 


and 


1)  pd  (n-2) 

^  “1 

(99) 

P  * 

“  y^(n)y  (n-l) 

ap(n-l) 

.  P  P  J 

«p.,(n)  = 

(100) 

dp -,<■’)  = 

Vp(n) 

(101) 

From  the  above  we  see  that  the  sequence  of  orthogonal  transformations  shown  in  equations  (93). 
(96)  and  (97)  solve  the  p‘^  order  forward  linear  prediction  problem.  Note,  however,  that  the  matrix  op¬ 
erated  upon  by  Op(n)  m  equation  (97)  consists  entirely  of  quantities  that  would  be  available  if  the 
(p-1  )*'  order  forward  and  backward  problems  had  already  been  solved  at  time  n  and  n-1  respectively.  If 
this  assumption  were  true  then  we  could  have  constructed  this  intermediate  matrix  directly,  thereby  cir¬ 
cumventing  the  need  for  the  operations  as  outlined  in  equations  (93)  and  (96).  Only  the  single  Givens 
rotation  of  equation  (97)  would  actually  need  to  be  performed  and  so  the  number  of  arithmetic  opera¬ 
tions  required  would  be  independent  of  p;  only  eight  elements,  one  of  which  is  zero,  of  the  left  hand 
matrix  in  equation  (97)  are  affected  by  the  required  rotation.  Having  derived  a  fast  method  for  solving 
the  forward  linear  prediction  problem,  we  now  consider  a  fast  update  method  for  the  auxiliary  (back¬ 
ward)  problem. 

3.3  Backward  Umar  Prediction 

The  p’*'  order  backward  linear  prediction  problem,  at  time  n.  requires  the  determination  of  the  vec¬ 
tor  of  filter  coefficients  Wp(n)  «  [  Wp  g(n) . '*'p  p  - 1  ^  minimises  the  total  prediction  error 

Cptn)  where 

«p<")  “ 

Again  the  least  squares  solution  to  this  problem  can  be  found  by  the  method  of  QR  decomposition. 
It  is  necessary  to  determine  the  rotation  matrix  Qp(n)  that  triangularises  the  data  matrix  Xp(n)  and  then 
apply  it  to  the  vector  y^(n)  in  order  to  calculate  ap(n)  (cf  equation  (21 )).  We  also  need  to  be  able  to 
calculate  Tp(n)  (see  equation  (33))  in  order  to  generate  the  a-po$teriori  prediction  residual.  Consider, 
tlierefore.  the  following  conqxisite  matrix  and  the  illustrated  decomposition  which  is  a  single  extension 
of  equation  (83) 
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(103) 


Xp(n)  yb(n) 


P"-'x(l)  o'^  0  0 


In  equation  ( 103).  it  has  been  assumed  that  the  data  sequence  x(n)  is  pre-windowed  (i.e.  x(n)  =  0 
for  n  ^  0).  Note  that  this  is  the  only  place  in  the  analysis  where  we  require  this  assumption^.  Consider 
the  effect  of  the  rotation  matrix  Qp.i(n-l)  on  the  lower  part  of  the  matrix  in  equation  (103): 


[p"‘'x(I)  p'^  0  0  • 

In  ,,  ^4=  uf.,(n)  Rp.,(n-l)u^.,(n-l)ap.,(n-l)  =M5  (104) 

P  Vp_  ivn  -  1) 

''f_,(n)  O  y^_|(n-])gp_j(n-I) 


As  before,  all  the  vectors  on  the  bottom  row  of  the  matrix  M5  may  be  written  in  terms  of  their  un¬ 
derlying  time  recursion  and  thus  we  obtain  the  expression: 


P"‘‘x(l) 

0 

0 

V>p.i(n)  Rp-i(n- 

«)  hp.i(n-  1)  ap. 

,(n-l) 

Pyp_j(n-i)  0 

py^.,(n-2) 

0 

«p.  ,(<t)  P’^ 

yp- 

,(n-l) 

(105) 


Now  suppose  that  we  have  already  constructed  a  rotation  matrix  Qp(n  -  1)  that  annihilates  the 
vector  Vp  _  |(n  -  1 )  by  rotation  against  the  element  P"  "  ‘  x(  1 )  i.e. 

Qp(n-l)p 
1 

Now  let  0p(n)  be  the  rotation  matrix  that  annihilates  the  element  Op  _  |(n)  by  rotation  against  the 
element  Pe^  _  j  (n  -  1 ) .  Application  of  the  transformation  0p(n)  to  the  above  matrix  yields  the  result: 


p£^_,(n-l) 

oT  Pp^_,(n-2)  0 

^-i^P-l) 

P 

0  pd^.,(n-2)  0 

P^  Op./n- ■)  Yp_i(n-1) 

=  M^ 


(106) 


5.  It  is  possible  to  develop  a  QRD-based  fast  Kalman  algorithm  in  which  x(n)  *  0  for  n  ^  0.  The 
resulting  algorithm  is  based  on  much  of  the  pre-windowed  version  presented  ho-e  but  with 
some  extra  computation  -  see  CiofH  [4]  for  hirther  details.  The  authors  are  not  aware  of  any 
similar  work  for  the  QRD-based  lattice  algorithm. 
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Qp(n)  Mp  =  P 


Hp_ 

,(n-l) 

V") 

.,(n- 

-J) 

,(n-l) 

ap_,(n-l) 

0 

0 

,(n-2) 

P 

0 

0^ 

A  b 

“p 

-i(n) 

Yp-,(n) 

=  M7 


(107) 


where  the  new  quantities  ap(n),  ap_  ,(n)  and  yp_  ,(n)  are  defined  by  this  equation.  Bearing  in  mind 
the  underlying  data  matrix  (see  equation  (103)),  recall  that  we  are  attempting  to  create  an  upper-trian¬ 
gular  pxp  matrix  in  the  upper  left-hand  comer  on  the  matrix  in  equation  (107).  At  present  this  sub-ma- 

-  b 

tnx  IS  not  quite  triangular  but  a  little  thought  shows  that  it  is  easy  to  construct  a  matrix  (Qp(n)  say ) 

-  b 

which  will  complete  the  required  triangularisation.  Specifically,  let  Qp(n)  be  constructed  from  a  se¬ 
quence  of  Givens  rotations  such  that  each  rotation  annihilates  one  element  of  the  vector  Up_  j(n)  in 
turn.  Provided  we  start  with  the  last  element  of  Up_  |(n)  and  work  upwards,  the  sequence  of  rotations 
will  not  destroy  the  triangular  structure  of  the  matrix  Rp_  |(n  -  1)  although  its  value  will  be  changed 
as  a  result.  Thus 


Qp(n)  M7  = 


Rp(n)  Up(n)  ap(n) 
O  vP(n)  gp(n) 


(108) 


Note  that  the  matrix  Qp(n)  only  affects  the  upper  part  of  the  panitioned  matrix  on  the  left  hand 
side  of  equation  (108).  It  should  therefore  be  clear  that 

cip.  |(n)  =  ap(n)  and  Yp.  |(n)  =  Yp(n)  (109) 


Also  note  that 


P’»p-,(n-2) 

a^(n) 


=  v>) 


(110) 


and  so  the  "new"  quantity  0^  _  |(n  -  2) ,  introduced  in  equation  (106).  is  equivalent  to  an  existing  var¬ 
iable.  Indeed,  from  equation  (21 )  we  have 


(111) 


Hence  the  sequence  of  orthogonal  rotations  given  in  equations  ( 104),  (106),  (107)  and  (108)  solve 
the  p'*’  order  backward  linear  prediction  problem.  Following  the  development  of  the  solution  to  the  for¬ 
ward  problem  in  section  3.2.  note  that  the  data  matrix  on  the  left-hand  side  of  equation  (107)  could  be 
constructed  directly  given  the  soluhons  to  the  (p-l)“  order  forward  and  backward  linear  prediction 
problems  at  time  n  and  n-1  respectively.  Thus  the  transformations  shown  in  equations  (104)  and  (106) 
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Figure  1 1  QRD-based  Least-squares  Lattice  Section. 

could  be  by-passed.  Furthermore,  assuming  that  we  are  only  interested  in  the  prediction  residuals,  the 
transformation  shown  in  equation  ( 108)  is  not  required  either,  since  a^fn)  and  y^ln)  are  both  available 
in  the  matrix  M7.  Thus  only  the  Givens  rotation  summarised  in  equation  (107)  need  actually  be  per¬ 
formed  and  the  number  of  arithmetic  operations  required  is  independent  of  p;  only  six  elements,  one  of 
which  is  zero,  of  the  matrix  are  affected  by  the  rotation. 

3.4  The  QRO  Least-squares  Lattice  Algorithm. 

Gathering  together  the  results  of  sections  3.2  and  3.3  we  see  that  it  is  possible  to  utilise  various 
terms  from  the  solution  to  the  (p-l)“  order  forward  and  backward  linear  prediction  problems,  at  time  n 
and  (n- 1 )  respectively  to  generate  corresponding  terms  for  the  solution  to  the  p''^  order  problems  at  time 
n  (see  figure  11).  Note  that  the  processing  elements  shown  in  figure  II  are  the  same  as  those  used  in  the 
triangular  systolic  array  described  in  section  2.3  (i.e.  as  shown  in  figure  3).  It  is  possible  to  show  that 
the  corresponding  square-root-free  implementation  may  be  obtained  very  simply  by  substituting  the 
processing  elements  shown  in  figure  4. 

Given  that  0^''  order  linear  prediction  is  trivial,  we  can  thus  generate  the  solution  to  the  p"*  wder 
problem  by  iteration  in  order  using  a  cascade  of  the  sections  shown  in  figure  1 1 .  The  resultant  architec¬ 
ture  (figure  12)  has  a  lattice  structure  and.  since  the  number  of  operations  per  stage  is  independem  of  p, 
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Delayed  Adaptive 
Filtering  Residual 


Figure  12  "Delayed"  Adaptive  Filtering  Lattice 


0(p)  operations  are  required  to  solve  the  p'*'  order  problem  (see  appendix  for  algorithm  listing).  Note 
that  by  including  the  adaptive  filtering  reference  vector  y(n'  1 )  in  the  calculation  of  the  p‘*'  order  forward 
linear  prediction  problem  (section  3.2)  we  automatically  solve  the  p***  order  adaptive  filtering  problem 
for  time  (n-1).  The  solution  to  the  adaptive  filtering  problem  for  time  n  can  be  easily  derived  from  the 
above.  Note  that  in  figure  1 1  the  rotation  parameters  used  (in  the  square  processors)  to  calculate  each 
order  update  of  the  delayed  joint  process  residual  are  evaluated  (in  the  round  processors)  from  the  de¬ 
layed  backward  residual.  Thus  by  using  the  undelayed  backward  residual,  we  can  calculate  the  rotations 
required  to  update  the  undelayed  joint  process  residual  -  see  figure  1 3.  In  this  form,  a  stage  of  the  QRD- 
based  lattice  can  be  seen  to  be  very  reminiscent  of  that  of  the  standard  least-squares  lattice:  the  only 
difference  is  that  the  former  structure  has  rotation  processors  instead  of  the  (reflection  co^ciem)  mul- 
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tipliers  of  the  conventional  form  (see  Haykin  [12]  figure  17.2).  Although  conceptually  appealing,  the 
QRD-based  lattice  filter  stage  shown  in  figure  13b  is  inefficient  because  there  is  duplication  in  the  cal¬ 
culations  of  the  rotation  parameters  for  the  forward  prediction  and  joint  estimation  residuals.  Any  prac¬ 
tical  implementation  would  clearly  calculate  the  rotation  parameters  only  once  in  the  joint  process  esti¬ 
mation  chaimel  and  store  them  for  use  in  the  next  time  instant  in  the  forward  prediction  channel. 

In  section  2.8  we  discussed  the  Recursive  Modified  Gram  Schmidt  (RMGS)  algorithm  with  error 
feedback.  This  was  proposed  by  Ling  Manolakis  and  Proakis  [15]  for  the  general  narrowband  adaptive 
beamfonning  problem  and  leads  to  a  triangular  array  processor  equivalent  to  the  one  described  in  fig¬ 
ures  2  and  4.  Ling  and  Proakis  [  1 6]  subsequently  developed  the  technique  to  produce  an  efficient  RMGS 
algorithm  with  error  feedback  which  requires  0(p)  arithmetic  operations  per  sample  time  to  solve  a  p'*' 
order  adaptive  FIR  filtering  problem.  Their  algorithm  also  has  a  lattice  structure  and,  in  view  of  the 
equivalence  referred  to  above,  it  is  not  surprising  to  find  that  it  corresponds  exactly  to  the  QRD-based 
least  squares  lathee  algorithm  derived  in  this  memorandum  (assuming  that  the  square-root-free  Givens 
rotations  in  figure  4  are  employed).  The  RMGS  lattice  algorithm  with  error  feedback  was  the  first  nu¬ 
merically  stable  least  squares  lattice  algorithm  to  be  developed  and  it  is  interesting,  therefore,  to  note 
this  correspondence  to  an  algorithm  based  enhrely  on  orthogonal  rotations. 

3.5  The  QRD  “Fast  Kalman”  Algorithm. 

In  this  section  we  expand  on  the  remarks  made  in  section  3. 1  about  the  conneaion  between  the 
forward  and  backward  linear  prediction  problems  for  a  fixed  order  and  develop  the  QRD-based  fast  Kal¬ 
man  algorithm.  We  take  as  our  starting  point  the  situahon  where  we  have  solved  the  p'**  order  forward 
linear  predichon  problem  for  hme  n  and  are  attempting  to  update  this  soluhon  to  the  next  hme  instant. 
Specifically,  if  we  could  generate  the  matrix  Qp(n)  using  0(p)  arithmetic  operations  then  the  linear  pre¬ 
diction  solution  could  be  updated  efficiently. 

The  original  derivation  of  a  QRD-based  fast  Kalman  algorithm  was  presented  by  Cioffi  [4]  al¬ 
though  his  algonthm  differs  somewhat  from  that  presented  here.  The  material  presented  in  this  section 
follows  that  of  reference  [23]  and  leads  to  the  same  algorithm  as  derived  by  Regalia  and  Bellanger  [27]. 
Both  of  these  deri  vahons  were  based  on  Cioffi's  work  and  the  approach  presented  here  actually  follows 
the  same  sequence  of  ideas  used  in  Cioffi's  original  paper.  The  difference  in  the  resulting  algorithms  is 
due  to  the  choice  of  triangular  matrix  used  in  the  QR  decomposition.  Here  we  use  an  upper,  right-hand 
triangular  matrix  to  conform  with  the  wotk  of  Gentleman  and  Kung  [8],  Cioffi  on  the  other  hand  chose 
to  use  a  uppa,  left-hand  triangular  matrix.  This  choice  results  in  an  algorithm  which  is  slightly  more 
complex  than  the  one  derived  here.  We  refer  interested  readers  to  the  original  references  for  funher  de¬ 
tails.  As  in  the  case  of  the  latbce  algorithm,  the  solution  to  the  adaptive  filtering  problem  is  updated 
along  with  the  linear  piedicaon  one.  In  fact,  since  we  will  be  calculating  the  matrix  Qp(n) ,  there  is  clear¬ 
ly  no  need  to  include  the  adaptive  filtering  problem  explicitly  in  the  following  analysis. 

Note  from  equations  (104),  (106),  (107)  and  (108)  that 
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(112) 


Q^*,(n-l)p;  1  o’T  1 

pT  ^,pQp(n-ir 

which  can  be  viewed  as  a  time  and  order  update  relationship  for  the  matrix  Qp(n  -  1 ) .  Also  from  equa¬ 
tions  (93),  (96)  and  (97)  we  have 


Qp+i<")  *  Qp+i(n)0p+i(n)' 


Qp+  it")  =  Qp*i(n+  l)i^P+*^"^  ®  Qp(n)  (113) 

"  ^  p-^  Ij  " 

or,  given  that  the  inverse  of  an  orthogonal  matrix  is  just  its  transpose, 

r  ^ 

Q(n)=  [Qp+,(n  +  l)]^Qp^,(n)  (114) 

o’^  I 

which  is  an  order  down-date  relationship  for  Qp^  j(n) .  Taken  together,  equations  (1 12)  and  (1 14)  rep¬ 
resent.  at  least  in  principle,  a  means  of  updating  the  matrix  Qp  from  one  time  instant  to  the  next.  How¬ 
ever  we  are  really  interested  in  obtaining  a  time  update  relationship  for  the  matrix  Qp(n) .  since  the  ma¬ 
trix  Qp  operates  on  the  entire  data  matrix  and  would  therefore  lead  to  an  algorithm  which  is  not  time- 
recursive  and  requires  an  ever  increasing  amount  of  storage.  Funhermore.  since  explicit  evaluation  of 
the  relationships  derived  above  would  require  0(rr*)  multiplications,  this  approach  could  not  lead  to  a 
"fast"  algorithm. 

-  b  -  b 

The  observant  reader  will  have  noticed  that  the  matrix  product  Qp+  i(n)Qp+  i(n)  depends  on 
knowledge  of  the  p*^  forward  Unear  prediction  problem  at  time  n  (equations  (107)  and  (108))  and  thus 
is.  by  assumption,  is  known;  whereas  the  matrix  Op-f  ](n  1)  depends  on  the  solution  to  the  p"’  order 

backward  problem  at  time  n.  This  is  somewhat  paradoxical  given  that  one  of  the  purposes  of  trying  to 
calculate  Qp(n)  is  to  calculate  this  solution!  However,  as  we  shall  see.  it  is  possible  to  avoid  this  para¬ 
dox  by  doing  the  matrix  multiplications  required  by  equations  (112)  and  (114)  impUcitly.  thus  comput¬ 
ing  the  matrix  <^p(n)  efficiently  and  generating  a  fast  algorithm.  We  do  this  by  constructing  only  the 
right-hand  column  of  the  matrix  Qp(n)  and  then  inferring  the  whole  matrix  from  this  vector  (as  in  sec¬ 
tion  2.7).  This  technique  reduces  the  dimension  of  the  problem  (from  matrices  to  vectors)  and  thus 
scales  down  the  computational  requirement:  in  fact,  the  evaluation  of  equations  (112)  and  (1 14)  is  ac- 
tuaUy  reduced  to  0(p}  arithmetic  operations  in  this  case.  Note  that  the  right-hand  column  of  the  matrix 
(^p(n)  can  be  considered  to  be  the  result  of  applying  the  rotation  matrix  (^p(n)  to  the  pinning  vector 
defined  in  equation  (67).  The  occurrence  of  the  pinning  vector  in  the  calculations  is  typical  of  fast  Kal¬ 
man  algorithms. 

Returning  to  equation  (1 12).  we  see  that 
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(115) 


Jp+ 


Qp+i<"K 


Qp+i(n)Qp+,(n)  '^P*' 


0 

(n-l)p 


Yp(n-1) 


and.  from  equation  (113), 


.f 


0  I  -  Qp+i(n+l) '^p+i  .  p 


f  T!M") 


:Yp*,(n) 


*  Jp<") 


(116) 


Now  from  equation  ( 106),  it  is  clear  that  the  vector  o  ajfn  -  1)  o'’"  y^fn  -  1)  is  unaffected  by  the 

b  -T 

matrix  Qp  +  |(n  -  1 ) .  Similarly,  the  vector  aj^(n)  o^  yp(n)  is  invariant  under  the  action  of  the  matrix 

Qp4.|(n)  (see  equation  (96)).  Hence,  we  deduce  that 


ap+i<") 


0 

Yp(n-1) 


(117) 


.  /n) 


and 


ap(n) 


P+'  -t 

0  “  Qp+ 1(»>+  1)  0 


Yp(n) 


(118) 


Finally,  with  reference  to  equation  (97),  we  note  that 


Qp+  i(n  + 1) 


a„(n) 


,  »P<"> 

o 


I 


Jp*l<">, 


(119) 


Clearly,  if  the  rotation  angle  corresponding  to  Op  4.  i(n  -i- 1 )  is  6  then 


;*p-fi'"l  _  I  cos©  sin©',  0  , 
•  Yp* i(n)  i-sin©  cos©!  Yp(i) 


(120) 


38 


lFx,„-OTHEN 

Xm 

1 

c«-l;  sr-O;  r„„,  «-r„;  y  *-y 

oul  in  *oui  *in 

ELSE 

'^iil — s 

Y  4—  CY  : 

'out  'in 

^out 

ENDIF 

X,n 

St  i 

(c.s) 

j  _ ^(C.S) 

rou,^s‘*.n  +  ‘:f.n 

1 

''out 

Figure  13  Fast  Kalman  Processors 

« r 

and  hence  the  rotation  matrix  Qp^  |(n  -t- 1)  can  be  calculated  indirectly  from  the  known  quantities 
*p+  we  can  avoid  the  paradoxical  situation  of  needing  to  know  the  solution  to 

the  p'**  order  backward  linear  prediction  problem  at  time  n  before  Op(n)  is  known. 

T 

Thusby  means  of  equation  (1 17)  and  we  can  transform  the  vector  ap(n-l)o^  Yp(n-l) 

T 

the  vector  aj^  |(n)  P^Tp+ifn)  inOf/ilorthogonalopcraiions.  Equation)]  l8)thenprovidestheba- 

T 

SIS  for  an  0(p)  method  for  transforming  this  latter  vector  into  a^(n)  o^  y  (n)  and  finally  Qp(n)  can 

p  >p  y 

T 

be  calculated,  again  in  0(p)  operations,  from  fj(n)  y^fn)  as  shown  in  section  2.7.  The  resulting 

algorithm  may  be  implemented  using  the  parallel  computing  architecture  shown  in  figure  14  with  rota¬ 
tion  processors  as  defined  in  figure  13.  Note  that  these  rotation  processors  are  essentially  the  same  as 
those  used  in  the  triangular  array  and  lattice  algorithms.  The  main  difference  is  that  the  processing  de¬ 
ments  in  figure  13  do  not  store  any  internal  variables:  all  variables  are  either  passed  imo  or  out  of  the 
processing  dement.  In  fact  if  these  processing  elements  are  equipped  with  a  storage  dement  and  the 
correct  output  variable  fed  back  to  the  rdevam  input  (as  shown  in  the  left-hand  cdumn  of  cells  in  figure 
14)  then  they  are  essentially  identical  to  the  processing  dements  shown  in  figure  3. 

An  interesting  consequence  of  using  the  QRD  approach  is  that,  unlike  other  fast  Kalman  algo¬ 
rithms.  the  QRD-based  fast  Kalman  algorithm  not  only  produces  the  solution  to  a  given  order  problem 
but  dso  that  for  all  lower  order  problems.  This  is  becnise  the  QRD-based  approach  to  least  squwes  min¬ 
imisation  is  inherently  an  order  recunive  process  (see  section  3.6).  For  instance,  consider  the  triangular 
processor  array  described  in  secoon  2.3.  It  should  be  dear  that  the  quantity  being  passed  down  to  the 
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boundary  cells  (the  product  of  the  vvious  cosine  terms  from  earlier  rotations)  is  the  quantity  y(N)  re¬ 
quired  by  the  tower  order  problems.  A  small  amoum  of  thou^  will  also  reveal  that  the  angle  normal¬ 
ised  residual,  a(n).  for  a  given  lower  order  problem,  is  the  quantity  passed  down  to  a  boundary  ih-occs- 
sor  from  the  last  intenud  processor  of  that  particular  ccduron  (see  figure  16). 


It  is  interestiiig  to  note  that,  at  first  glance,  the  algorithm  pictured  in  figure  14  appears  not  to  include 
any  quantities  related  to  the  backward  hnear  prediction  problem;  however  this  is  not  true.  It  is  possible 
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to  show  (see  section  3.6)  that  the  vector  a^Cn)  consists  of  the  backward  prediction  residuals  for  orders  0 
to  p- 1  normalised  by  the  respecuve  predicuon  error  energy.  Indeed  Regalia  and  Bellango-  [27]  derived 
their  algorithm  using  these  quantities  explicitly. 

3.6  Physical  Intarpratation  of  Fast  Atgorithm  Paramatars. 

The  QRD-based  approach  to  least-squares  minimisation  is  just  one  of  many  different  ways  in 
which  the  problem  can  be  solved.  However,  the  quantities  used  in  a  QRD-based  algorithm  appear  to  be 
radically  different  from  those  to  be  found  in  the  more  familiar  approaches.  Clearly  because  the  under¬ 
lying  problem  is  the  same,  the  variables  in  a  QRD-  based  algorithm  must  be  related  to  more  conventional 
quantities.  In  this  section  we  point  out  some  of  these  relationships  along  with  some  other  interesting  in¬ 
ter-relationships  between  quantities  found  in  the  QRD-based  approach.  Indeed,  the  QRD-based  ap- 
proia:h  to  least-squares  minimisation  and  hnear  prediction  in  particular,  offers  many  useful  insights. 

We  have  already  seen  that  the  forward  and  backward  prediction  residual  powers  appear  quite  nat¬ 
urally  in  the  QRD-based  lattice  algorithm  -  albeit  in  terms  of  their  square-roots:  e'  and  (see  equa¬ 
tions  (97)  and  (107)).  The  lattice  algorithm  also  calculates  estimates  of  the  partial  correlation  coeffi¬ 
cients  [12].  Consider  the  term  Pp  _  ,  in  equation  (97):  if  the  rotation  angle  corresponding  to  Qp(n)  is 
0  and  we  let  c  *  cos0  and  s  ■  sin0  then 


c  s  _  p;.,(n)E^.,(n-)) 

-sc  af.,(n)  ot^.,(n-l)  a^fn)  0 

i.e,  =  Pcpf.,(n- D  +  sa' .,(n) 

PE^.,(n-2) 

where  c  =  / —  —  and  s  »  ^ 

Combimng  equations  ( 1 22)  and  (1 23)  we  find,  after  some  algebra,  that 

bp-i(")  “  P^fp-|(n- »  +  ap_,(n- Da^ _,(n) 


(121) 


(122) 


(123) 


(124) 


-  j;  P^‘”~"”a^.,(m-l)a'_,(m)  (125) 

m  •  1 

where  bp_j(n)  ■  ep_|(n- l)Pp_,(n)  (126) 


41 


Now,  by  comparison  with  equation  (47),  we  have 


and 


ap_,(n-l)  -  ,e^_,(n-l,n-2)e^.,(n-(,n-l) 


(127) 


(128) 


Thus  the  angle  normalised  linear  prediction  residuals  are  identical  to  the  so  called  rationalised  residu- 

als[19}.  In  particular  we  see  that  this  interpretation  of  the  angle  normalised  residuals  implies  that  the 

right  hand  side  of  equation  ( 1 23)  can  be  viewed  as  a  (weighted)  estimate  of  the  crosscorrelation  between 

the  normalised  forward  and  backward  residuals.  Indeed  for  stationary  signal  statistics,  once  a  recursive 

least  squares  prediction  algorithm  has  converged,  the  a  priori  and  a  posteriori  residuals,  and  hence  the 

,  2 

angle  normalised  residuals,  are  identical.  Finally,  as  the  backward  residual  power  _  ,  ]  is  effective¬ 
ly  an  estimate  of  the  mean  square  backward  residual,  we  see  from  equation  (126)  that 


lip-i(n)- 


,(n- 


].  n  • 


l)e'.,(n.  n)) 


(e^.,(n-l.n-l))^ 


(129) 


In  a  similar  manner,  it  is  possible  to  show,  again  from  equation  (97),  that 


[ep-|(n-  l.n-  l)ep_  ,(n-  1,  n-  D] 
(e^.,(n-l.n-I))^ 


(130) 


and,  from  equation  (107).  that 


ti^.,(n-l). 


[e[,.,(n.n)e^_,(n-l.n-l)] 
^(ep_,(n.  n))^ 


(131) 


Thus  we  see  that  u*^  ,(n).u*’  ,(n-l)andu  ,(n- I)  are  estimates  of  the  various  PARCOR  coef- 

—  J  *^p  -  I  r  p  «  I 

ficients. 

Next  we  consider  the  order-recunive  nature  of  the  QRD  approach  to  linear  prediction.  Note  that 
equations  (97)  and  (98)  provide  a  recursive  decomposition  of  the  matrix  Rp(n  -  1).  Specifically,  and 
for  time  n  rather  (n-1). 
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This  shows  that  the  diagonal  elements  of  the  matrix  Rp(n)  are  in  fact  the  square-roots  of  the  back¬ 
ward  prediction  residual  energy  terms  for  each  of  the  sub-order  problems.  It  also  indicates  why  it  is  sen¬ 
sible  for  the  Givens  rotations  used  in  QRD-based  linear  prediction  to  ensure  that  the  diagonal  elements 
of  the  R(n)  matrix  are  always  positive.  Equation  (132)  also  shows  that  the  off-diagonal  pans  of  the  tri¬ 
angular  matrix  are  the  various  "y”  vectors.  This  is  not  surprising  considering  that  the  linear  prediction 
problem  could  be  solved  using  a  full  pxp  systolic  array  with  the  time  series  x(n)  fed  in  via  a  tapped  delay 
line  as  indicated  in  figure  16.  It  therefore  serves  to  emphasise  the  faa  that  a  QRD-based  approach  gen¬ 
erates  the  solution  to  all  sub-order  problems  as  well  as  the  target  problem  of  a  given  order. 

Order  recursion  also  plays  an  importam  pan  in  any  least-squares  lattice  algorithm.  Traditionally 
the  order  recursion  for  the  prediction  residuals  takes  the  form  [12] 

e'(n.  n)  -  Cp  _  ,(n,  n) + k!,(n)ep  _  ,(n  -  1 ,  n  -  1 ) 

I  I  b  f 

c^(n,  n)  *  Cp  _  ,(D  - 1 ,  n  -  1)  +  k^(n)ej  _  ,(n.  n) 

where  kp(n)  and  kp(n)  are  the  p'*'  order  reflection  coefficients.  Unlike  the  conventional  lattice  algo¬ 
rithms.  the  QRD-based  one  derived  in  section  3.4.  does  not  explicitly  calculate  refleaion  coefficients: 
instead  the  order  update  for  the  (angle  normalised)  residuals  takes  the  form  (see  equations  (97)  and 
(107)) 
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(134) 


«p(n)  “  c^(n)a'  _  ,(n)  -  s^(n)Pn'  _  ,(n  -  1) 
a^(n)  =cJ(n)a^_j(n-  l)-s^(n)pti^_j(n-2) 


lip  -  ,('>)  =  Cp(n)PHp_,(n-  l)  +  s'(n)of  ,(n) 

where 

)ip  -  i(n  -  1 )  -  c^(r»)Pn^  -  ,(n  -  2)  +  s^(n)o^  _  ,(n  -  1 ) 


(135) 


and  Cp(n),  $p(n),  Cp(n)  and  Sp(n)  are  the  sines  and  cosines  of  the  transformations  Op(n)  and  Op(n)  re- 
speed  vely.  Equadon  ( 1 34)  appears  quite  different  to  the  usual  latdce  equations  shown  in  equation  (133): 
however,  note  that  equation  ( 1 34)  is  written  in  terms  of  angle  normalised  residuals  and  not  a-posteriori 
ones,  and  that  the  sines  and  cosines  are  functions  of  the  residuals  (see  equations  (97)  and  (107)).  Con- 
voting  from  the  angle  normahsed  residuals  to  the  a-posteriori  ones  using  equation  (45)  and  evaluating 
the  sines  and  cosines  we  obtain,  after  some  manipulation. 


ep(n,  n)  =ep.  ,(n,  n)- 


tip-|(") 


ep.,(n-  1,  n-  1) 


ep(n,  n)  =  Op  _  ,(n  -  1 ,  n  -  1)  -  - e'  _  ,(n,  n) 


(136) 


from  which  it  follows  that 


k>)  =  - 

k^(n)  =  - 


>>p-i<") 

Up- .O’-’) 

«p-|(n) 


(137) 


This  result  is  not  surprising:  the  reflection  coefficients  in  equation  (133)  are  defined  to  be  those 

n  n 

values  that  minimise  the  terms  ^  ep(m,  m)and  Y,  Cpfoi-m).  In  other  wm-ds.  the  reflection  coef- 

m  ■  1  m  a  1 

ficiems  are  the  coefficients  in  a  first  order  least  squares  minimisation  problem.  From  section  2.1  we 
know  that,  when  using  the  QRD  technique,  the  least-squares  coefficients  are  given  by 

w  »  -R-'u  (138) 


In  the  case  of  a  first  order  problem,  both  the  matrix  R  and  the  vector  u  are  scalars  and  for  the  least- 
squares  minimisation  proUem  shown  in  equation  ( 1 33)  these  quantities  equate  to  those  shown  in  equa¬ 
tion  (137).  This  equality  can  most  easily  be  seen  with  reference  to  figure  1 1.  If  the  least-squares  mini- 


misation  problem  is  one  of  first  order,  then  the  triangular  QRD  array  (cf.  figure  2)  will  be  a  single  cir¬ 
cular  processing  element  and  the  right-hand  column  will  reduce  to  a  one  square  processing  element. 
These  structures  can  easily  be  identified  in  figure  1 1  from  which  the  relationships  shown  in  equation 
( 1 33)  can  readily  be  deduced. 


As  remarked  earlier,  the  QRD-based  fast  Kalman  algorithm  is  also  unusual  in  that  it  does  not  cal¬ 
culate  conventional  quantities  (the  optimum  coefficients).  As  desaibed  in  section  3.3.  the  QRD-based 
fast  Kalman  algorithm  calculates  the  rotation  matrix  Op(n) .  This  matrix  is  then  used  as  shown  in  equa¬ 
tion  (2 1 )  to  compute  the  vector  u^fn)  and  produce  the  adaptive  filtering  residual.  However,  we  note  that 
the  vector  Up(n)  is  merely  a  transformed  version  of  the  optimum  coefficients  Wp(n),  as  discussed  above. 


Another  unusual  feature  of  the  algorithm  derived  in  section  3.5  is  that  it  does  not  appear  to  make 
use  of  any  quantities  related  to  the  backward  prediction  problem.  This  is  not  true  however  since  the  vec¬ 
tor  gp(n)  may  be  interpreted  in  terms  of  the  backward  prediction  residuals.  To  see  the  equivalence,  con¬ 
sider  equations  (1 19).(  120)  and  (123):  it  is  clear  that 


8p(n) 


Yp_,(n)sin© 


=  |Yp-i<n)«p-i(")|  = 


ap.,(n) 

ep.,(n.n)| 


(139) 


and  we  see  that  the  vector  ap(  n)  consists  of  the  energy  normalised  backward  prediction  residuals  of  order 
(p- 1 )  and  below. 

3.7  Weight  Extraction  from  Fast  Algorithms 

The  QRD-based  least  squares  lattice  and  fast  Kalman  algorithms  presented  above  are  based  on  the 
"direct  residual  extraction"  technique  and  as  such  produce  the  adaptive  filtenng  residual  without  explic¬ 
itly  calculating  the  optimum  weight  vector.  This  is  highly  desirable  in  an  adaptive  filtoing  context  since 
the  residual  is  the  quantity  of  interest;  however  in  system  identification  the  primary  goal  is  the  calcula¬ 
tion  of  the  weight  vector.  The  two  weight  extraction  techniques  presented  for  the  full  QRD-based  algo¬ 
rithm  -  weight  flushing  (section  2.6)  and  parallel  weight  extraction  (section  2.7)  -  are  equally  applicable 
to  the  fast  algorithms.  Note,  however,  that  neither  of  the  fast  algorithms  explicitly  calculates  the  trian¬ 
gular  matrix  R  so  that  the  back-substitution  method  (equation  ( 1 1 ))  is  not  available. 

Both  the  lattice  and  fast  Kalman  algmithms  use  the  same  processing  elements  as  the  full  triangular 
QRD  array  and  can  therefore  be  operated  in  the  frozen  mode.  If  a  unit  impulse  is  fed  into  the  frozen 
filter,  its  output  will  clearly  be  the  impulse  response  of  the  system  i.e.  the  set  of  filter  weights.  Note  how¬ 
ever  that  unlike  the  narrow-band  beamformer  of  section  2,  the  adaptive  filters  presented  here  are  not 
memoryless  systems  when  operated  in  their  frozen  mode;  the  (implicit)  tapped  delay  line  must  continue 
to  opo'ate.  It  is  therefore  necessary  to  ensure  that  the  tapped  delay  line  is  full  of  zeros  before  applying 
the  unit  impulse.  Thus  it  would  require  an  input  time  series  of  length  2p+l  consisting  of  a  single  sample 
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of  value  unity  sandwiched  between  two  sets  of  p  consecutive  zero  samples. 

Although  this  operation  will  produce  the  filters  impulse  response,  it  is  an  invasive  procedure  - 
again  because  of  the  fact  that  the  system  has  memory.  Having  frozen  the  filter  and  passed  the  impulse 
sequence  through  it  the  state  of  the  filter  (the  contents  of  the  tapped  delay-line)  will  have  been  altered 
compared  to  the  point  in  time  when  the  filter  was  frozen.  Thus  it  is  not  now  possible  to  unfreeze  the 
filter  and  continue  with  the  adaption  process  as  if  nothing  had  happened.  If  the  system  is  allowed  to 
adapt  from  this  incorrect  filter  state  then  the  output  will  be  in  error  -  until  sufficient  data  has  been  proc¬ 
essed  so  that  the  error  in  the  filter's  state  has  decayed  away.  This  may  well  be  acceptable  in  certain  sit¬ 
uations  since  this  process  of  converging  to  the  required  solution  is  exactly  what  happens  when  the  filter 
is  first  started.  One  way  in  which  weight  flushing  can  be  made  non-invasive  is  for  the  contents  of  the 
filter  delay  elements  to  be  stored  before  the  weight  flushing  begins  and  the  restored  before  the  adaption 
continues. 

Recall  from  section  2.7.  that  in  order  to  extract  the  filter  weights  in  parallel  with  the  adaption  proc¬ 
ess,  It  is  necessary  to  have  available  the  rotation  matrix  0(n).  This  matrix  is  explicitly  calculated  in  the 
fast  Kalman  algorithm  and  indeed  is  also  available  in  the  lattice  algori’hm  m  a  disguised  form:  it  can  be 
shown  that  the  rotation  matrices  Qj(n)  ( 1  S  i  S  p)  are  equal  to  the  elementary  Givens  rotations 
G,(n)  (I  S  i  S  p)  that  make  up  the  matrix  0(n)  -  see  equation  (30).  Thus  it  is  possible  to  use  the  ad¬ 
ditional  hardware  shown  in  figure  8  in  conjuncticn  with  the  fast  algorithm  (instead  of  the  triangular 
processor  array)  to  generate  the  weights.  However,  the  utility  of  this  approach  is  questionable  since  the 
addition  of  the  extra  processing  means  that  the  computational  load  of  the  complete  algorithm  rises  from 
0(p)  to  0(p‘)  and  the  advantage  of  the  fast  algorithm  is  lost. 

3.8  Computer  Simulation. 


The  main  concern  with  "fast"  algorithms  is  that  they  are  potentially  more  sensitive  to  numerical 
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errors  than  their  generic  couterparts.  This  is  because  the  fast  algorithms  exploit  some  mathematical  re¬ 
lationship  between  various  quantities  in  the  generic  algorithm  in  order  to  reduce  the  computational  load. 
In  the  case  of  the  least-squares  lattice  algorithms,  the  assumption  that  the  problem  has  already  been 
solved  for  the  one  order  allows  the  solution  to  the  next  order  problem  to  be  generated  efficiently.  Now 
in  practice,  the  calculations  can  only  be  done  to  a  finite  accuracy  so,  strictly  speaking,  the  assumptions 
upon  which  the  fast  algorithms  are  based  (e.g.  the  existence  of  the  solution  to  the  lower  aria  problems) 
are  only  gqiproximately  true.  We  can  therefore  expea  to  pay  some  penalty,  in  terms  of  numerical  stabil¬ 
ity.  for  the  reduced  computational  load.  The  perceived  advantage  with  the  fast  QRD-based  algorithms 
is,  of  course,  that  they  should  be  more  robust  in  the  presence  of  numerical  errors  than  other  fast  algo¬ 
rithms. 

In  order  to  investigate  the  effects  of  finite  precision  on  the  fast  QRD-based  algorithms  we  consider 
the  application  of  an  adaptive  filter  to  a  typical  channel  equalisation  problem  (figure  17).  In  particular, 
we  consider^  the  case  of  an  adaptive  equaliser  applied  to  a  data  channel  with  a  “raised  cosine"  impulse 
response  (equation  (140)). 


h(n)  = 


^  1  +C0S  (^(n-2)) 


n=  1,2,3. 


0 


otherwise 


(140) 


By  varying  the  parameter  W.  the  amount  of  interference  between  a  given  symbol  and  the  two  either  side 
of  it  can  be  changed.  This,  in  effect,  controls  the  eigenvalue  spread  of  the  data  covariance  matrix  (see 
table  2). 

An  1 1"'  order  adaptive  QRD-based  least-squares  adaptive  filter  is  used  to  equalise  the  channel  re¬ 
sponse.  In  the  simplest  of  situauons.  the  equaliser  would  be  trained  periodically  by  transmitting  a  known 
sequence  and  adapting  the  equaliser  with  a  stored  version  of  this  signal  as  the  "reference"  signal  (see 
section  3.1).  In  baween  these  training  sessions,  with  the  adaption  frozen,  the  channel  can  be  used  for 
the  transmission  of  data,  hopefully  with  the  inter-symbol  imerference  much  reduced. 

In  our  computer  experiment,  the  transmission  charmel  is  fed  with  a  polar  ( ±  1 )  pseudorandom  train¬ 
ing  sequence.  This  sequence,  delayed  by  seven  time  instants,  is  used  as  the  reference  signal  for  the  adap¬ 
tive  filtering  algorithm.  The  delay  is  insened  to  ensure  that  the  adaptive  filter  has  an  impulse  response 
that  IS  symmetrical  about  the  centre  tap.  A  small  quantity  of  “measurement"  noise,  in  the  form  of  a  pseu¬ 
dorandom  sequence  with  an  approximately  gaussian  probability  distribution  function,  is  added  to  the 
channel  output.  The  noise  sequence  used  has  zero  mean  and  a  variance  of  0.001 .  The  “forget  faaor”  ^ 
(see  equation  (4))  was  fixed  at  a  value  of  0.996.  which  implies  an  effecbve  data  window  (i.e.  the  dura¬ 
tion  for  which  any  data  veaor  has  an  effect  on  the  filter  adaption)  of  250  time  samples. 


6  Suggested  by  S  Haykin. 
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^^max^min* 

2.9 

6.1 

3.1 

11.2 

3.3 

21.9 

3.3 

47.5 

Table  2  Eigenvalue  Spread 

All  calculations  within  the  algorithm  were  performed  using  limited-precision  floating  point  arith¬ 
metic.  Only  the  number  of  bits  in  the  mantissa  is  varied  during  the  experiments:  the  number  of  bits  in 
the  exponent  is  fixed  at  eight  No  quantities  internal  to  the  adaptive  Altering  algorithm  are  held  to  a 
greater  precision  than  .  ^uts:  the  results  of  all  arithmoic  operations  are  immediately  reduced  to  the 
required  precision.  The  numerical  performance  of  most  algorithms  can  be  improved  by  using  higher 
precision  for  some  internal  calculations:  however  it  is  necessary  to  have  a  good  understanding  of  the 
algorithm  and.  in  particular,  to  identify  the  critical  intermediate  quantities  for  this  method  to  be  used 
effectively. 

The  performance  of  the  equaliser  is  monitored  by  recording  the  ensemble-averaged,  squared  a-pri¬ 
ori  equahsation  error  (see  equation  (40)).  This  has  the  advantage  that  it  shows  how  close  to  convergence 
the  algorithm  is  whilst  still  showing,  asymptotically,  the  least  squares  equalisation  error.  The  ensemble 
average  is  taken  over  100  realisations  of  the  experiment.  Several  experiments  were  performed  using 
various  combinations  of  parameters  and  algorithms  however  only  the  main  results  are  discussed  below. 
Care  should  be  taken  in  the  interpretation  of  any  computer  simulation  expoiment.  In  particular,  when 
the  numerical  stability  of  an  algorithm  is  being  investigated  it  should  be  noted  [3]  that  instability  is  often 
preceded  by  a  period  of  apparent  stability  (e.g.  flgure  19. 8  bit  mantissa  plot).  Thus  simulation  experi¬ 
ments  can  only  confirm  lower  bounds  on  the  sequence  length  required  to  cause  instability  and  not 
"prove"  that  a  system  is  stable. 

Figures  1 8  and  1 9  show  the  basic  performance  of  the  fast  QRD-based  equaliser  algorithms,  using 
the  square-root,  feedforward  Givens  rotations  (SQ/FF),  for  different  values  of  wordlength  and  eigenval¬ 
ue  spread.  Figure  18  shows  that,  with  double-precision  arithmetic,  the  rate  of  convergence  is  more  or 
less  insensitive  to  the  different  eigenvalue  spread  settings:  as  would  be  expected  from  a  recursive  least 
squares  minimisation  process.  Figure  19  illustrates  how  the  wordlength  affects  the  performance  for  a 
Axed  eigenvalue  spread  (W-2.9).  There  is  very  little  discemable  difference  between  the  Alters  using  1 2, 
16  and  56  (IEEE  double-precision)  bit  mantissas  and  these  Alters  appear  to  be  well  behaved  for  data 
sequ«ices  of  length  up  to  200.  The  Arst  sign  of  any  instability  appears  with  a  mantissa  of  8  bits.  Here 
the  Alters  initially  show  signs  of  converging  to  a  stable  state  but  then  begin  to  diverge  producing  an  ever 
increasing  error.  The  4  bit  systems  clearly  do  not  behave  in  a  sensible  manner  as  would  be  expected 
Aom  such  a  short  wordlength.  Note  that  there  is  very  little  difference  between  the  two  fast  algorithms 
(the  lattice  and  the  fast  Kalman). 

Figure  20  shows  a  comparison  of  the  QRD-based  lattice  algorithm  with  the  full  QRD-based  trian- 
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Figure  18(b)  QRO  fast  Kalman;  Effect  of  Eigenvalue  Spread. 
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gular  systolic  array  version.  Four  systems  are  also  shown  in  this  figure:  they  are  the  "square-root-free 
with  feedback"  (SF/FB)  forms  of  the  lattice  algorithm  and  the  array  algorithm  along  with  the  SQ/FF 
versions.  This  figure  shows  the  case  of  4  bit  mantissas  and  a  fixed  eigenvalue  spread  setting  (W=2.9). 
This  may  be  considered  to  be  an  excessively  shcxt  wordlength.  The  reason  for  this  choice  is  that  often 
finite  precision  effect  are  often  only  manifested  after  the  round-off  errors  have  had  time  to  accumulate 
[3].  By  using  a  small  wordlength,  the  appearance  of  such  effects  occur  sooner  thus  reducing  the  time 
necessary  to  perform  the  simulation. 

In  most  cases,  a  p***  order  RLS  adaptive  filter  will  converge  within  2p  time  instants.  At  this  point 
the  a-priori  residual  will  have  reached  a  value  primarily  determined  by  the  eigenvalue  spread  and  not 
the  wordlength.  As  round-off  errors  accumulate,  the  a-priori  error  will  increase  indicating  a  loss  of  ac¬ 
curacy  in  the  algorithm.  Up  to  a  run  length  of  10000.  the  longest  simulation  run  to  date  by  the  authors, 
the  SQ/FF  lattice  algorithm  remains  stable  with  12  bit  mantissas.  In  the  case  of  the  SF/FB  lattice,  the 
same  behaviour  is  seen  using  only  4  bit  mantissas. 

It  can  be  seen,  in  figure  20.  that  in  the  SQ/FF  mode  the  faster,  lattice  algorithm  is  only  marginally 
worse  than  the  full  triangular  array  version  thus  demonstrating  that  httle  penalty  has  been  paid  in  reduc¬ 
ing  the  computational  load.  As  expected,  the  square-root-ftee  with  feedback  versions  of  the  algorithms 
perform  better  than  the  basic  versions.  In  this  case,  there  was  no  discemable  difference  between  the  lat¬ 
tice  version  and  the  array  version  in  any  of  the  simulations  run  so  far.  This  would  seem  to  demonstrue 
the  power  of  the  feedback  technique  in  improving  the  numerical  accuracy  of  these  algorithms. 

The  relative  effect  of  the  square-root-ffee  and  the  feedback  techniques  can  be  t^wn  in  figure  21 . 
This  shows  the  performance  of  the  lattice  algorithms  with  4  bit  mantissas  and  fixed  eigenvalue  spread 
setting  (W=2.9)  for  the  six  possible  Givens  rotation  algorithms:  SQ/FF.  SF/FB,  square-root  Givens  ro¬ 
tations  with  feedback  (SQ/FB).  square-root-free  feedforward  rotations  (SF/FF).  square-root  rotations 
with  the  stored  parameter  fedback  (SQ/MFB).  and  square-root-free  routions  with  the  stored  parameter 
fedback  (SF/NfFB)  -  see  section  2.4  for  more  details.  From  this  it  can  be  seen  that  there  is  indeed  a  nu¬ 
merical  advantage  to  avoiding  the  square-root  opo'ation  but  that  the  most  significam  improvement 
comes  about  by  introducing  the  "error  feedback"  -  provided  that  the  feedback  is  applied  properly.  Figure 
21  shows  that  there  is  little  difference  in  performance  between  the  "FF’  and  "MFB"  versions  whereas 
the  "FB  "  versions  (i.e.  the  "error  feedback"  algorithms)  are  sigmficantly  beat 

In  conclusion,  we  have  found  that  for  both  the  triangular  QRD  processor  in  figure  2  and  the  least 
squares  lattice  filter  in  figure  12.  the  best  numerical  performance  over  a  wide  range  of  computer  simu¬ 
lations  was  obtained  using  the  SF/FB  Givens  rotations  defined  in  figure  4.  The  faa  that  a  square-root- 
free  algorithm  performs  best  is  not  entirely  surprising.  A  closer  analysis  of  the  conventional  Givens  ro¬ 
tation  algorithm  defined  in  figure  3  shows  that  the  square  root  operation  is  only  required  in  situations 
where  we  sum  the  squares  of  two  numbers  and  then  compute  their  square  root.  Avoiding  this  process 
could  certainly  improve  the  numerical  performance.  The  to  that  the  feedback  form  of  square-root-ffee 
Givens  rotation  in  figure  4  performs  best  is  contrary  to  the  initial  expectation  of  numerical  analysts  [11]. 
However,  it  is  entirely  consistent  with  the  fact  that  the  corresponding  error  feedback  RMGS  lattice  al- 
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gonthm  has  also  been  found  to  exhibit  robust  numerical  stability.  Since  Ling.  Manolakis  and  Proalcis 
were  first  inspired  to  introduce  the  error  feedback  mechanism  by  analogy  with  the  stability  techniques 
used  in  control,  it  would  appear  that  this  provides  a  better  way  of  analysing  the  numerical  stability  of 
other  signal  processing  algorithms. 

We  have  repeated  the  above  experiments  with  values  of  the  W  parameter  other  than  2.9  and  all  of 
the  above  observations  appear  to  hold  essentially  independently  of  the  eigenvalue  spread. 

4  Wide-band  Beamforming. 

4.1  MuRhchanml  Adafitive  FINars 

In  a  multi-channel  least  squares  adaptive  filtering  problem  at  time  n.  a  set  of  N  p-dimensional 
weight  vectors.  w^'\n)  (0  $  i  ^  N-1).  is  to  be  found  that  minimises  the  sum  of  the  squared  diffimnces 
between  a  reference  signal  y(n)  and  a  linear  ctnnbination  of  N  samples  from  each  of  p  dau  time  series 
>ti(tn-j)  (l^i^p.  0^ji£N-l).  This  is  equivalent  to  adaptively  fihenng  p  separate  time  series  in  order 
to  form  the  best  esdmtteofthe  reference  signal  (see  figure  21).  If  the  p  dau  sequences  come  from  spa¬ 
tially  separate  antennae  then  we  have  a  spttial  as  well  as  a  temporal  filtering  problem.  In  this  sense,  the 
muhi-channel  adaptive  filtering  problem  subsumes  both  the  narrow-band  beamforming  problem  and  the 
(single  channel)  adaptive  filtering  problem. 

To  be  specific^,  the  measure  E„(y'^)  ■  ,  e^n)  ^  is  to  be  minimised,  where: 

ej,(n)  -  X^n)w’„  +  y(n)  (141) 
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>■*‘(1)...  x'^(2-N) 

XN(n)»B(n)i  ;  (142) 

jc'^(n)  ...  x''‘(n-N  +  l) 

x^(n)  »  jx,(n)  Xj(n)  ...  Xp(n)  (143) 

w'N(n)=  (144) 

and  y(n)  *  B(n)[y(l) . y(n)]^  (145) 

Equation  (143)  serves  to  define  the  new  vector  quantity  x(n).  Note  that,  apart  fi'om  the  change  from 
scalar  to  vector  quantities,  equation  ( 143)  is  identical  to  equation  (79).  Indeed  it  would  be  possible  to 
consider  the  case  where  the  reference  signal  y(n)  is  replaced  by  several  such  signals.  In  this  case  we 
would  have  to  replace  the  vector  y(n)  by  a  matrix.  We  will  not  pursue  the  idea  of  multichannel  joint 
process  estimation  any  further  here  but  note  that  a  similar  situation  arises  naturally  in  section  4.2  when 
we  consider  mulu-channel  linear  prediction. 

The  solution  of  this  vector  least  squares  minimisation  problem  via  QR  decomposition  follows  the 
usual  pattern  and  requires  the  determination  of  an  orthogonal  matrix  Qi>j(n)  that  transforms  the  matrix 
X|si(n)  into  upper  triangular  form.  The  fact  that  the  matrix  Xj^n)  is  block-Toeplitz  aUows  us  to  use  the 
ideas  developed  in  section  3  to  construct  "fast"  algonthms.  As  one  might  expect,  it  is  possible  to  derive 
both  a  fast  Kalman  and  a  lattice  algorithm. 

4^  MuM-chamwl  Laltlc* 

The  extoision  of  the  lattice  algorithm  presented  in  section  3.4  to  the  wide-band  beamforming  prob¬ 
lem  is  relatively  straight  forward:  the  only  change  required  is  that  ceitain  scalar  quantities  be  replaced 
by  vectors  and  some  veaors  be  replaced  by  matrices(24}.  The  essential  features  of  the  derivation  pre¬ 
sented  in  section  3.4  cany  over  exactly.  The  only  poim  where  the  derivation  of  the  multi-channel  case 
deviates  in  any  appreciable  way  from  that  given  in  section  3.4  is  in  the  extension  to  p  dimensions  of 
operations  on  one  dimensional  objects. 


7  To  be  rigorous,  we  should  label  quantities  involved  with  a  p-channel  N-tap  multi-channel  least  squares 
minimisaticn  problem  with  two  indices  (viz  p  arxl  N).  However  in  the  following  we  will  be  consider¬ 
ing  o^y  iterations  in  the  nunriier  of  taps  md  not  the  nunber  of  channels  Thus  for  notational  simplicity 
we  will  indicale  explicitly  only  the  number  of  taps  being  considered  -  the  number  of  channels  being 
anumed  to  be  fixed 
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In  the  solution  of  the  p‘"  order  single-channel  forward  linear  prediction  problem  it  was  necessary 
to  determine  the  rotation  matrix  Qp(n-  1)  that  annihilated  all  but  one  component  of  the  vector  yp.,(n-2) 
(see  equation  (96)).  In  the  N"*  order  multi-channel  forward  linear  prediction  problem  we  have  to  deter¬ 
mine  a  similar  rotation  matrix.  However  in  this  case,  the  vector  yp.,(n-2)  is  replaced  by  a  suitably  de¬ 
fined  matrix  Vj^.](n-2)  (with  p  columns  -  one  for  each  channel).  The  equivalent  operation  to  that  of 
Qp(n  -  1 )  is  to  convert  VN.i(n-2)  into  an  upper-triangular  matrix  i.e.  to  perform  a  QRD  decomposition! 
Indeed  the  operation  in  the  single-channel  algorithm  can  be  considered  to  be  a  QRD  decomposition  on 
a  vector  and  the  resulting  single  non-zero  component  to  be  a  1x1  triangular  matrix.  Similarly,  the  next 
step  in  the  derivation  (cf.  equation  (97))  consists  of  calculating  the  p  rotations  necessary  to  perform  the 
recursive  QRD  update  on  the  above  triangular  matrix,  instead  of  just  one  rotation. 

The  resultant  architecture  (see  figure  22)  has  a  lattice  structure  where  each  stage  of  the  lattice  con¬ 
tains  two  triangular  systolic  arrays.  The  total  number  of  operations  necessary  to  solve  an  I'i"’  order  mul¬ 
ti-channel  adaptive  filtering  problem,  with  p  channds.  is  thus  0{Np^).  Note  that  in  figure  22  some  of 
the  vector  dau  lines  have  been  “twisted".  This  merely  signifies  the  fact  that  dau  is  fed  into  the  triangular 
arrays  in  the  order  specified  by  the  mathematics*. 

The  architecture  shown  in  figure  22  is  imuiovdy  satisfying  for  the  following  reasons.  It  is  well 
known  [28]  that  the  lattice  structure  of  linear  prediction  algorithms  is  inherem  to  the  proUem.  Indeed  it 
is  easy  to  show  in  more  general  terms  diat  a  p*^  order  linear  prediction  problem  can  be  solved  using  a 
lattice  structure  which,  in  effea.  solves  two  least-squares  minimisaoon  problems  at  each  stage.  These 
least-squares  nunimisation  problems  relate  to  the  determination  of  the  forward  and  backward  reflection 
coefficients  and  calculate  the  order-updated  residuals.  Specifically,  for  a  multi-channel  problem. 


8  It  IS  easy  lo  show  that  this  dau  twist  in  not  necessary  since  a  pennutabon  erf  the  dau  does  not  affect 
the  value  of  the  residualt;  however  we  do  not  pursue  this  idea  any  further  in  this  menKxwidum 
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(146) 


CpCn,  n)  «  e'  _  |(n,  n)  -  Kp(n)ep  _  ,(n  -  1 .  n  -  1 ) 
ep(n,  n)  »  _  j(n  -  1 ,  n  -  1)  -  Kp(n)e'  _  ,(n.  n) 

where  e^ln,  n)  and  a^in,  n)  are  the  p*  order  f(»^ard  and  backward  a-posteriori  residual  vectors 
and  Kp(n)  and  Kp(n)  are  pxp  reflection  coefflctem  matrices,  respectively.  As  we  have  shown,  each  of 
the  triangular  arrays  depicted  in  figure  22  is  capable  of  performing  a  recursive  least  squares  minimisa¬ 
tion  and  calculating  the  residual  directly.  A  close  look  at  figure  22  will  show  that  these  triangular  arrays 
(iterate  on  the  forward  and  backward  residual  vectors  in  precisely  the  manner  required  to  solve  these 
problems  (cf.  discussion  following  equation  (137)). 

An  alternative  method  [33]  for  deriving  the  multi-channel  QRD-based  lattice  algorithm  actually 
begins  with  the  standard  multi-channel  lattice  algorithm  and  transforms  it  into  a  purely  orthogonal 
square-root  information  algorithm.  In  the  “standard"  RLS  lattice  algorithm,  the  forward  prediction  re¬ 
siduals  for  order  p  (say)  are  found  by  subtracting  linear  combinations  (reflection  coefficiems)  of  the 
(p-1)”  order  backward  prediction  residuals  from  the  (p-l)”  order  forward  residuals.  A  similar  relation¬ 
ship  holds  for  the  p'*'  order  backward  residuals.  The  calculation  of  the  reflection  coefficients  requires 
an  inversion  of  the  data  covariance  matrix  which  is  computationally  expensive  and  often  ill-condi¬ 
tioned.  Using  a  Cholesky  decomposition  of  the  data  covariance  matrix,  Lewist  1 3]  showed  how  this  pan 
of  the  algonthm  could  be  transformed  into  a  recursive,  square-root  information  process.  In  this  algo¬ 
rithm  the  reflection  coefficients  are  no  longer  calculated  explicitly:  quantities  analogous  to  the  vector  u 
in  equation  (II)  and  the  veaor  a  in  equation  (91 )  are  calculated  instead.  Nevertheless  the  p'*'  order  re¬ 
siduals  are  still  calculated  in  terms  of  the  difference  between  the  (p-1 )"  order  residuals  and  another  term 
(now  a  funaion  of  the  quantities  analogous  to  u  and  a).  As  the  bulk  of  the  calculation  is  exactly  the 
computation  of  the  reflecbon  coefficients.  Lewis  proceeded  no  further  with  this  re-formulation  and  ap¬ 
parently  failed  to  notice  that  the  "non-orthogonal"  pan  of  his  algorithm  is  in  fact  redundant.  Yang  and 
BOhme  [33]  observed  that  the  adaptive  filtering  residuals  were  effectively  being  produced  along  with 
the  computation  of  the  vectors  u  and  a  -  as  shown  in  section  2.5:  this  observation  results  in  the  con¬ 
struction  of  a  purely  orthogonal  algorithm  and  is  equivalent  to  that  derived  from  first  pnnciples  here. 

4  J  MuRi-channel  Fast  Kalman  Algorithm 

The  derivation  of  the  multi-channel  fast  Kalman  algorithm  is  somewhat  more  difficult  than  the  lat¬ 
tice  equivalent.  If  the  various  substitutions  of  scalars  foe  vectors  and  veaon  for  matrices  are  earned  out 
then  it  is  relatively  easy  to  generate  a  'Yast”  algorithm.  However  an  operation  count  will  reveal  that 
CHNp^}  operations  are  required  to  solve  a  p  channel  n"*  order  problem.  Assuming  that  N>p.  this  is  “fast" 
compared  to  the  0(N^p^)  operatitms  requited  when  using  a  generic  triangular  systolic  airay  fed  by 
tapped  delay-lines  but  falls  short  of  the  OfNp^)  operations  required  by  the  lattice  algorithm.  It  is  possi- 
ble(2]  to  generate  ui  0(Np^)  multi-channel  fast  Kalman  algorithm  and.  as  in  the  single-channel  case, 
the  pinning  veaor  is  used  to  aDow  the  inference  of  a  rotation  matrix  from  veaor  quantities.  As  before 
this  technique  reduces  the  problem  by  one  dimension  and  produces  an  O(Np^)  algorithm. 

The  “bottle-neck"  in  the  ludve  generalisation  of  the  single-channel  algorithm  (with  0(Npr^)  opera- 
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tions)  is  the  calculation  equivalent  to  that  shown  in  equation  (108).  For  convenience  we  reproduce  this 
equation,  in  pan,  below; 


j^(n)  o”''  ! 

i(n)i»>>  Rp("- »)  =  I  (147) 

;  o  o  i  L  o  j 

u  0  o"  ^ 

In  the  multi-channel  equivalent,  the  vector  11^(0)  is  replaced  by  an  Npxp  matrix  U^(n)  (say)  and  the 
scalar  Ep(n)  by  a  pxp  triangular  matrix  ( E{|j(n)).  The  operation  of  annihilating  this  Npxp  matrix  -  equiv¬ 
alent  to  a  block  recursive  QRD  update  -  requires  0(Np-^)  Givens  rotabons:  it  requires  0(p)  operabons 
to  annihilate  a  p-dimensional  vector  by  rotabon  against  a  pxp  triangular  matnx  and  the  matrix  U^(n) 
has  Np  rows. 

An  altemabve  procedure  for  anruhiiabng  the  matrix  uj^(n)  is  to  eliminate  it  one  column  at  a  bme 
rather  than  one  row  at  a  bme  as  in  the  recursive  QRD  update  (see  seaion  2.2).  Each  column  of  ujg(n) 
has  Np  components  and  therefore  will  require  0(Np)  operabons  to  annihilate  it  If  this  was  aU  that  was 
required  we  would  have  a  fast  algorithm:  Ujijfn)  has  p  columns  making  a  total  of  O(Np^)  operabons. 
However,  it  is  not  sufficient  for  each  column  of  the  matrix  ujj(n)  just  to  be  annihilated  individually: 
the  rotations  that  annihilate  a  given  column  must  also  be  applied  to  the  other  columns.  Columns  that 
have  previously  been  annihilated  clearly  will  not  be  affected  by  this  operabons  but  the  non-zero  ones 
will  be.  Thus  the  rotabons  that  annihilate  the  fiist  column  must  be  applied  to  the  (p-1)  other  columns: 
the  rotations  that  annihilate  the  second  column  wiU  have  to  be  applied  to  (p-2)  other  columns,  and  so 
on.  This  sequence  of  steps  clearly  requires  O(Npr’)  operabons. 

In  the  above  scheme,  the  column  vectors  of  l!jj(n)  arc  subject  to.  in  general,  several  rotations.  This 
IS  exaeby  the  situabon  we  faced  in  the  construebon  of  the  single-channel  fast  Kalman  algorithm  (see 
equabons  (112)  and  (113)).  The  solution  to  this  proUem  was  to  use  the  pinning  vector  to  effeebvely 
condense  the  rotabons  down  to  a  single  one.  Using  this  idea  in  the  mulb-channel  case  leads  to  an  O(Np^) 
algorithm.  The  sequence  of  operabons  is  then  as  follows:  The  left-hand  column  of  the  matrix  uj^n)  is 
annihilated  (0(Np)  operabons)  and  this  rotabon  is  applied  to  the  second  column  and  a  pinning  vector 
(0(Sp}  operabons).  The  transformed  second  column  is  then  annihilated  {0(Np)  operabons)  and  the  ro- 
tttions  applied  to  the  transformed  pinning  vector  (0(Np)  operabons).  The  doubly  bansformed  pinning 
vector  can  then  be  “unrotated"  (0(Np)  operabons)  to  generate  a  rotabon  that  is  equivalent  to  the  com¬ 
bined  effect  of  the  earlier  pair  of  rotabons.  This  combined  rotabon  is  then  applied  to  the  third  column 
and  another  pinning  veaor  and  the  process  conbnues.  At  each  stage  only  0(Np)  operabons  are  required 
and  thus  all  p  columns  of  u{[^n)  can  be  annihilated  in  0{Np^}  operabons. 

A  more  detailed  study  of  the  mulb-channel  fKt  Kalman  algorithm  shows  that  it  is.  in  effect  just 
the  appiicabon  of  the  angle  channel  algorithm  many  bmes.  It  should  be  clear  from  the  above,  that  the 
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Figure  23  Multi-channel  fast  Kalman  Algorithm 


annihilation  of  one  of  the  columns  of  the  matrix  UJ^(n)  involves  the  three  steps:  the  appUcation  of  a 
luiown  rotation,  annihilation  of  a  vector  and  subsequent  calculation  of  the  combined  rotation  based  on 


the  rotated  pinning  veaor.  In  the  single  channel  algorithm,  exactly  the  same  type  of  operations  are  re¬ 
quired:  the  known  rotation  is  Qp(n)  and  the  resultant  rotation  is  Qp(n  +  1 ) .  The  multi-channel  algorithm 
thus  has  the  structure  shown  in  figure  23.  Each  pass  of  the  •‘single-channel"  algorithm  requires  0(Np) 
operations  (since  the  matrix  uj^n)  has  columns  of  dimension  Np)  and  p  such  passes  are  required  (be¬ 
cause  the  matrix  U{^(n)  has  p  columns)  so  the  final  operation  count  is  O(Np^). 
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6  Appendix 

The  following  algorithms,  written  in  pseudo-ALGOL,  are  for  a  narrow-band  beamformer  and  a 
single  channel  adaptive  filter  only.  Two  narrow-band  beamformer  algorithms  are  presented:  the  first  al¬ 
gorithm  uses  the  obvious,  feedforward  implementation  of  Givens  rotations  using  square-roots:  the  sec¬ 
ond  one  avoids  taking  square-roots  by  calculating  transformed  quantities  and  implements  the  rotations 
via  the  feedback  algorithm.  Both  the  least  squares  lattice  and  fast  Kalman  algorithms  are  given  but  only 
using  the  conventional  Givens  rotations  using  square-roots.  Based  on  the  diagrammatic  representations 
of  the  algorithms  and  the  interchangeability  of  the  ptocessing  elements,  it  should  be  possible  for  the 
reader  to  generate  any  of  the  “missing"  algorithms.  In  the  same  spirit,  it  should  be  clear  how  to  modify 
these  algonthms  to  include  such  aspects  as  parallel  weight-extraction. 

The  computation  count  down  the  nght-hand  side  of  the  page  assumes  that  the  signals  being  proc¬ 
essed  are  real  -  although  the  mathematics  is  written  assuming  complex  quantities.  The  complexity  of 
division  has  been  equated  to  that  of  square-rooting  and  multiplication  by  the  exponential  weighting  fac- 
r  p  IS  counted  as  one  general  purpose  multiply  -  which  may  not  be  the  case  if  P  is  chosen  to  be  of  a 
Mmple  form  such  as  1  -  2".  Note  that  the  algorithms  are  not  optimised:  the  computational  load  could 
be  reduced  by  rewriting  the  algorithm  to  take  advantage  of  intermediate  quantities  common  to  two  or 
more  calculations;  such  compact  forms  of  the  algorithms  are  not  presented  here,  in  order  that  the  regu¬ 
larity  of  the  calculation  should  not  be  obscured. 

The  narrow  band  beamformer  algorithms  take  as  inputs  a  p-dimensional  vector  of  auxiliary  signals 
x(t)  and  a  reference  sequence  y(t)  and  calculate  the  beamformer  residual.  The  adaptive  filtering  algo¬ 
nthms  calculate  the  filter  residual  for  a  p'*"  order  system  fed  with  a  pre-windowed  data  sequence  x(t) 
and  a  reference  sequence  y(t). 

Note:  r^  |(t)  is  the  (i.j)"’  component  of  the  matrix  r(t); 
x_(t)  IS  the  i'*’  component  of  the  vector  )((t). 


6.1  SQ/FF  Narrow-band  Beamformer  Algorithm 

START 

add/ 

mult  sqrt/ 

INITIALISE  lall  variables  :=  0); 

subi 

divide 

FOR  t  FROM  I  DO 

LET  oi(t):-y(t);  y(t):=l: 

FOR  1  FROM  I  TO  p  DO 

LETrjj(t):=,.P^[r,,(t -!)]'+  x,(t)'; 

1 

3  1 

IFr,  .(t)  =  0  THEN  LET  c:=l;  s:=0; 

Pr,  ,(t-l)  x,(t) 

ELSELETc:=-  -  ,  ; 

1  2 

END_IF; 

FOR  j  FROM  i+l  TOpDO 
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I 


I 


1 


2 


LET  x';=  (cXj(t)  -  sr^  .(t  -  1)) ; 


r,  j(t):=(cr,  j(t-l)  +  s*Xj(t)); 

1 

2 

- 

Xj(t):=x' 

END_DO: 

LET  a':=  (ca(t)  -  sUi(t  -  1)) ; 

1 

2 

_ 

Uj(i):=  (cu,(t-  l)  +  s*a(t)): 

1 

2 

- 

a(t):=a';  Y(t):=CY(t) 

END_DO;  {i  loop) 

- 

1 

- 

COMMENT  p'*"  order  filtered  residual  COMMENT 

LET  e(t,  t):=  (y(t)a(t)) 

- 

1 

- 

END_DO;  (tloop) 

p^+2p 

2p^+7p+l 

3p 

FINISH 

6.2  SF/FB  Narrow-band  Beamformer  Algorithm 

START 

add/ 

mult 

sqrt/ 

INITIALISE  (all  variables  :=0); 

FOR  t  FROM  1  DO 

subt 

divide 

LET  e(t,  t- I):=y(t);  6(t):=l; 

FOR  i  FROM  1  TO  p  DO 

LET  r,  ,(t).=  ,(t  -  1)  +b(t)  x,(t)  ; 

IF  r,  ,(t)  =  0  THEN  LET  c:=  1 ;  s:=0; 

1 

3 

■ 

P^r,  ,(t-l)  6(t)x,(t) 

ELSE  LET  c:=  -  ;  s:-  • 

- 

2 

2 

END_1F; 

FOR  j  FROM  1+1  TOp  DO 

LET  Xj{t):=  ( Xj(t)  -  r,  ^(t  -  1  )x,(t)) ; 

1 

1 

- 

1)  +  Xj(t)s*) 

1 

1 

. 

END_DO; 

LET  e(t,  t  -  1):=  (e(t.  t  -  1)  -  Ujd  -  Ux^d)) ; 

1 

1 

Ui(t):=  (Uj(i-  l)  +  s*e(t,t-  D); 

I 

1 

- 

6(t):=c6{i) 

. 

1 

. 

END_DO;  {i  loop) 

COMMENT  p""  order  filtered  residual  COMMENT 

LET  e(t,  t):*  (6(t)e(t,  t  -  1)) 

- 

I 

- 

END_DO;  |t  loop) 

p^+2p 

p^+7p+I 

2p 

RNISH 
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6.3  SQ/FF  Lattice  Algorithm 

START 

INITIALISE  {all  variables  ;=0); 
FOR  t  FROM  1  DO 


LET  Ogd)  :=  x(t);  a^(t)  ;=  x(t); 

ao(t):=y(t):  Yg(t)  :=  1.0; 


FOR  q  FROM  1  TO  p  DO 
LET .  ,(t)  :=  (£^  _  ,(t  -  1 ))  ^  + 1 _  ,(t)  ^ 

lFe^_,(t)  =  0  THEN  LET  c'(t+l):=  1;  s^(t  +  ]);=0; 

PE^.,(t-l)  (t) 

ELSE  LET  cUl  +  1 )  ;=  - ;  (t  +  1)  ^  ; 

END_IF; 

LET  pf  .  ,(t)  :=  (t)ppf  .  ,(t  -  I)  +  s'*(t)af  _  ,(l); 
af{t):=  cf(t)af_,(t)-s^(t)ppf_,(t-l); 


^‘q-  !<'>  •'  ,(t-  D  +  S^I  (t+  l)a^.,(t); 

a^(t)  :=  cf(t+  l)cx^.  ,(t)-sf(t  +  l)pPq_,(t-  !); 

Y^(t)  :=  c;(t+l)Y,.i(t): 

COMMENT  q-th  order  forward  prediction  residual  COMMENT 
eqO.O  :=  Y<,(t-l)af(t); 

COMMENT  q-th  order  filtered  residual  COMMENT 
Sqf'- ■=  fqiOa.jd): 

LET  ef  _  ,(t)  :=  .  PM _  ,(t  -  1 ))  ^  -► : .  ,(t)  ^ 

IF  .  ,(t)  =  0  THEN  LETc^(t)  ;=  1; 


ELSE  LET  c^(t)  :=  ^  ‘ 


s^(t)  ;=  0; 

.j.i.’r"’ 


END_IF; 

LET  p^.  ,(t -  1)  :=  c^(t)Pp^_  ,(t - 2) -E  s^‘(t)a^_  ,(t  -  I); 

a^(t)  ;=  c^(t)a^.,(t-  l)-s5(t)Pp^.,(t-2): 
COMMENT  q-th  order  backward  prediction  residual  COMMENT 
eq(t.t)  :»  Y,(t)a^(i): 


END_DO 


add/ 

subt 


I 


1 


1 

1 


1 


I 

1 


S 


mult  sqrt/ 
divide 


3  1 

1  2 

3 

3 

3 

3 

1 


3  1 

1  2 

3 

3 

E  i 
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COMMENT  order  Wiered  residual  COMMENT 

ep(t,t);=  Yp(t)ap(t)-.  -  1 

END_DO 

FINISH  g  27^ 


6.4  SQ/FF  QRD  Fast  Kalman 

START 

INITIALISE  (aU  variables  ;=  0}; 
FOR  q  FROM  1  TO  p  DO 

LETCq  :=  1.0; 

END_DO; 

LETYp:=  1.0;- 

FORtFROM  1  DO 


LET  Ogd)  :=  xu); 

FOR  q  FROM  1  TO  p  DO 
LET  u'(t)  ;=  CqPu'd-  l)  +  s“a^_,(t); 

END_DO; 

COMMENT  p'*"  order  forward  prediction  residual  COMMENT 

ep(t.  t)  ;=  Ypaf(t)  ; 


LETffpd)  :=  .  P^(ef(t- D)  +  aj, (t) ' 
IF  ef(t)  =  0  THEN  LET  c^.  I  :=  1.0 


ELSE  LET  c 


p+  1 


P+1 

/(t)  ■ 


='p+l  ■ 


*p+l  - 

_ 

‘  e^d)- 


0.0; 


END_IF; 

LETrp,, 

FOR  q  FROM  p  TO  1  DO 

LETsf_,d)  :=  .;(ej|d))^+  y^d)^; 

IF  _  (t)  =  0  THEN  LET  :=  1.0; 


*b 

s,: 


0.0; 


ELSE  LET  Cq 


END_IF; 


u^d) 


LETa^.,  ;-  6^a^  +  i^  a^d-1); 


add/  mult 
subt 


P  3p 

P  3p 


1  3 


1 


2 


P  2p 


P  2P 


6P 

sqtt/ 

divide 


1 

2 

P 

2p 
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END_DO 
LET  a,(t)  :=  Bq; 

LETYp:=  , 

FOR  q  FROM  p  TO  1  DO 

:=  ,  (Yq)^+  aq(t)  ^: 

lFy^_,  -  OTHENLETCq:=  I;  :=  0; 

Yq  a  (t) 

ELSE  LET  c  :=  ;=  ; 

<1  Y  “)  V 

'q-I  >'q-l 

END_IF; 

END_DO; 

LET  Ogd)  :=  y(t); 

FOR  q  FROM  1  TO  p  DO 
LET  Uq(t)  :=  CqPu^(t-l)  +  s*a^_,(t); 

END_IX); 

COMMENT  p'*'  order  filtered  residual  COMMENT 

Cpd.t)  :=  YpOpd); 

END_DO 

RNISH 


P  2p 

1  2  1 

p  2p  p 


2p 


P  3p 

P  3p 


1 
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